← all repositories
frutik/awesome-search

A 15-year search veteran's curated reading list

A hand-organized index of papers, talks, and blog posts covering everything from BM25 to vector chunking to search-team hiring.

1.5k stars Shell RAG · SearchLearning
awesome-search
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

This is a curated bibliography of search-related resources, maintained by a developer with 15+ years in e-commerce search. Links are grouped by topic—lexical search, semantic vectors, query understanding, UX, evaluation metrics, team management, even the economics of search—and cross-posted when they fit multiple categories.

The interesting bit

The table of contents alone reads like a graduate syllabus that actually knows industry exists. It doesn’t just list “vector search”; it breaks down bi-encoders vs. cross-encoders vs. ColBERT, dense vs. sparse vectors, Matryoshka embeddings, and quantization. Someone has actually had to debug these things.

Key highlights

  • Covers the full stack: relevance algorithms (BM25, Bayesian BM25), ranking, personalization, diversification, zero-result handling
  • Semantic search section goes deep on encoder architectures, chunking strategies, and dimensionality reduction
  • Includes practical concerns often skipped: A/B testing, GDPR/tracking, hiring for search teams, case studies
  • Heavy representation from practitioner voices—Etsy, Baymard Institute, Daniel Tunkelang, OpenSearch, Vespa
  • Sister repos for e-commerce, knowledge graphs, and cloud apps

Caveats

  • No code, no tools to run—purely a reading list (the repo language is “Shell” only because of a doctoc-generated TOC)
  • Some sections are sparsely populated or empty (cross-encoders, multimodal search)
  • “Unsorted” bucket at the bottom suggests ongoing curation debt

Verdict

Worth bookmarking if you’re building or refining search and tired of piecing together scattered Medium posts. Skip it if you want executable libraries or a quick tutorial.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.