← all repositories
datawhalechina/all-in-rag

A Chinese cookbook for building RAG systems that won't hallucinate

A systematic, hands-on tutorial covering the full RAG stack from chunking to graph-based retrieval.

all-in-rag
Velocity · 7d
+23
★ / day
Trend
steady
star history

What it does

All-in-RAG is a comprehensive Chinese-language tutorial for building retrieval-augmented generation systems. It walks through the entire pipeline: data loading and text chunking, vector and multimodal embeddings, vector database construction with Milvus, hybrid retrieval, query rewriting, Text2SQL, formatted generation, and system evaluation. Two end-to-end projects are included, with an optional Graph RAG optimization track using Neo4j.

The interesting bit

Most RAG content is scattered blog posts; this attempts to be a single curriculum. The project also accepts community contributions as “Extra Chapters” — one contributor already added a Neo4j simple application, another is optimizing a multimodal omni-embedding practice with Jina v5-omni.

Key highlights

  • Ten chapters progressing from “four steps to RAG” to advanced architecture
  • Covers multimodal embeddings (text + image) and hybrid search (dense + sparse)
  • Includes practical Milvus deployment and index optimization
  • Graph RAG chapter with knowledge graph construction and intelligent query routing
  • Systematic evaluation chapter with tools and metrics, often skipped in tutorials
  • Extra Chapter section open to community submissions with review process

Caveats

  • Primary content is in Chinese; English README exists but scope is unclear
  • Chapter 10 (second project) is marked “规划中” — in planning, not yet available
  • Some Extra Chapter content is noted as “优化中” (being optimized)
  • Docker and basic Linux command line skills are expected prerequisites

Verdict

Worth bookmarking if you’re a Mandarin-speaking developer who needs structured RAG training rather than piecemeal Medium articles. Skip if you need mature, production-hardened frameworks or prefer English-first documentation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.