A Chinese-flavored RAG Swiss Army knife for local LLMs
Langchain-Chatchat wires LangChain to ChatGLM, Qwen, and Llama for fully offline knowledge-base Q&A with a Streamlit face.

What it does Langchain-Chatchat is a local knowledge-base Q&A application built on LangChain. It ingests documents, chunks and vectorizes them, retrieves relevant snippets via BM25 or KNN, and stuffs them into prompts for local LLMs like ChatGLM, Qwen, or Llama. It exposes a FastAPI backend and a Streamlit web UI, and can run entirely offline with open-source models.
The interesting bit The project is aggressively model-agnostic: it abstracts model loading through Xinference, Ollama, LocalAI, and FastChat, and wraps everything in an OpenAI-compatible SDK interface. The 0.3.x rewrite also added Agent capabilities optimized specifically for ChatGLM3 and Qwen, plus multimodal support via qwen-vl-chat.
Key highlights
- Supports mainstream open-source LLMs, embedding models, and vector databases (FAISS, Milvus implied by topics)
- File RAG unifies multiple retrieval methods: BM25 + KNN, not just brute vector search
- Agent mode with automatic or manual tool selection, plus database chat, arXiv lookup, Wolfram Alpha, and text-to-image
- Docker, pip, and source deployment options; AutoDL cloud GPU images available
- WebUI improvements in 0.3.x: better multi-session support and custom system prompts
Caveats
- Docker images noted as “will be updated soon” — deployment may require manual setup
- No training or fine-tuning included; the README explicitly states this is glueware you can improve downstream
- Agent stability was flagged as shaky in 0.2.x; 0.3.x claims improvement but real-world mileage with non-ChatGLM3/Qwen models may vary
Verdict Worth a look if you need a Chinese-optimized, fully offline RAG stack with broad model compatibility and don’t want to wire LangChain yourself. Skip if you need a polished SaaS experience or heavy custom fine-tuning infrastructure.