A seven-week slog from Docker to agentic RAG, with homework
This repo is a structured course that builds a production arXiv research assistant week by week, starting with infrastructure and ending with LangGraph agents and a Telegram bot.

What it does
This is a curriculum disguised as a codebase. Over seven weekly releases, you build a complete RAG system that fetches arXiv papers, indexes them in OpenSearch, and answers research questions through a Gradio UI or Telegram bot. Each week adds a layer: Docker infrastructure, Airflow ingestion pipelines, BM25 keyword search, hybrid retrieval, local LLM integration, monitoring with Langfuse, and finally agentic RAG with query rewriting and document grading.
The interesting bit
The course deliberately starts with boring, reliable keyword search before adding vectors. The README calls this the “professional path” — solid search foundations enhanced with AI, not AI-first approaches that ignore retrieval fundamentals. Week 7’s LangGraph workflow adds decision nodes that can rewrite queries or reject off-topic questions entirely, which is more guardrail architecture than most tutorial RAG systems bother with.
Key highlights
- Weekly tagged releases let you clone specific stages (
week1.0throughweek7.0) without wading through the full codebase - Full Docker Compose stack: FastAPI, PostgreSQL, OpenSearch, Airflow, Ollama, Redis, Langfuse, and a Gradio interface
- Uses Docling for scientific PDF parsing and Jina embeddings (free tier) for semantic search
- Includes notebooks for hands-on setup each week, plus companion blog posts on Substack
- Telegram bot integration in Week 7 for mobile access to the agentic pipeline
Caveats
- Requires 8GB+ RAM and 20GB disk space just to run the infrastructure locally
- You need to manually add API keys for Jina embeddings and Langfuse; the
.env.exampledefaults don’t cover everything - The README is enthusiastic about “production-grade” systems, but this is explicitly a learning project — not a maintained product
Verdict
Good fit if you want a structured, week-by-week walkthrough of RAG system construction with actual services to run, not just notebooks. Skip it if you already ship RAG in production and need reference architecture rather than pedagogy.