darrencxl0301/StageRAG
A lightweight RAG framework offering switchable 3-step (Speed) and 4-step (Precision) pipelines built on quantized Llama 3.2 1B/3B models.

StageRAG is a production-ready framework for building hallucination-resistant RAG applications. It provides dual-mode pipelines that let users choose between speed (3-step, ~3-5s) and precision (4-step, ~6-12s) based on their needs. The framework integrates with knowledge bases via JSONL files, automatically builds vector indices for retrieval, and includes multi-component confidence scoring to detect uncertainty and reduce hallucinations. It runs on quantized Llama 3.2 1B and 3B models requiring 5-10GB GPU memory.