← all repositories
wxywb/history_rag

ChatGPT meets 24 Dynastic Histories: a RAG experiment

A Chinese-history Q&A system that grounds LLM answers in actual classical texts instead of letting GPT-4 hallucinate about Guan Yu.

1k stars Python RAG · SearchLanguage Models
history_rag
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

What it does

This project wires up a RAG pipeline so you can ask questions about Chinese history—like “Did Guan Yu really scrape the bone to cure poison?”—and get answers backed by retrieved passages from classical texts rather than GPT-4’s imagination. It ships with two setups: a local Milvus + LlamaIndex stack using the BAAI/bge-base-zh-v1.5 embedding model, or a managed Zilliz Cloud Pipelines route for heavier corpora. Either way, OpenAI GPT-4 handles the final generation.

The interesting bit

The repo is essentially a well-documented tutorial in glue code, but the glue is tuned for a real pain point: classical Chinese history is exactly the kind of niche, high-stakes domain where LLMs love to confabulate dates, names, and events. The author includes the full “Twenty-Four Histories” corpus and structures ingestion around traditional jizhuanti (纪传体) biography format—chapter titles like “某某传” with indented body text—so citations map cleanly back to source material.

Key highlights

  • Two deployment paths: local Dockerized Milvus for tinkerers, Zilliz Cloud Pipelines for scale without ops headaches.
  • Supports swapping in local LLMs (via fastchat), Gemini, or Qwen for the OpenAI-averse; config lives in cfgs/config.yaml.
  • Added reranker support in the June 2024 update for better retrieval ranking.
  • Gradio web UI available if CLI nostalgia wears thin.
  • Debug flag (ask -d) exposes the raw retrieved chunks so you can audit what the model is actually reading.

Caveats

  • Default setup still requires an OpenAI API key; the “local LLM” paths exist but need extra config and code tweaks in executor.py.
  • Zilliz Cloud Pipelines currently only ingests via URL, not local files or folders—local-file support is listed as “coming later.”
  • Answer consistency is explicitly noted as unstable because LLM generation is stochastic; the retrieval step is deterministic, but GPT-4’s output isn’t.

Verdict

Worth a spin if you’re building domain-specific RAG and want a working reference for Chinese-language embedding models plus classical-text chunking. Skip it if you need a fully open, API-key-free stack out of the box—this is still fundamentally an OpenAI-GPT-4 system with escape hatches, not defaults.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.