← all repositories
gabriben/awesome-generative-information-retrieval

A field guide to when LLMs stop hallucinating and start retrieving

A curated map of the messy, expanding territory where generative models meet traditional search.

awesome-generative-information-retrieval
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

This is an “awesome” list—curated, not coded—that catalogs papers, tools, datasets, and workshops at the intersection of generative AI and information retrieval. It splits the field into two rough camps: models that ground their answers in external sources (RAG, search augmentation) and models that generate document identifiers directly, essentially replacing the retrieval step entirely.

The interesting bit

The list itself is a sign of institutional confusion. The maintainers had to invent categories like “Generative Document Retrieval” and “Live Generative Retrieval” because the field hasn’t settled its own boundaries. There’s even an “Epistemology Papers” section—academics are now doing philosophy of mind on whether ChatGPT is “bullshit” (actual paper title, cited).

Key highlights

  • Heavy on RAG variants: retrieval at inference time, memory manipulation, re-ranking, constrained generation, multimodal pipelines
  • Includes generative recommendation and knowledge graphs as adjacent territories
  • Curated datasets for evaluation: FACTSCORE, FACTKB, BRIGHT, LegalBench, TruthfulQA
  • Tools section covers Microsoft’s GraphRAG, HuggingFace’s TRL, and PrimeQA
  • Tracks workshop lineage: SIGIR hosted the first two “Generative Information Retrieval” workshops (2023, 2024)

Caveats

  • No code in the repository itself—this is purely a reading list
  • Coverage feels slanted toward 2023–2024; older foundational work is thin
  • “Pull-requests welcome” suggests the taxonomy is still actively contested

Verdict

Worth bookmarking if you’re building RAG systems and need to survey the landscape without drowning in arXiv. Skip it if you want implementation details or a stable, settled field—this is a snapshot of a discipline still arguing with itself about what it even is.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.