← all repositories
redis-developer/ArXivChatGuru

A Redis tutorial wearing a lab coat

A deliberately simple RAG demo that fetches arXiv papers, embeds them, and answers questions via Redis vector search.

563 stars Python RAG · SearchLLMOps · Eval
ArXivChatGuru
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

ArXiv ChatGuru is a Streamlit app that turns an arXiv topic into a searchable knowledge base. You pick a subject and paper count; it fetches from arXiv, chunks the PDFs, generates OpenAI embeddings, and stores everything in a Redis vector index. Questions get answered by retrieving the closest chunks and feeding them to a chat model via LangChain.

The interesting bit

The README is admirably honest: this is “intentionally simple” and “a learning project,” not a production research assistant. That clarity is refreshing. The stats page that exposes Redis index metadata and query engine stats is a nice touch for understanding what’s actually happening under the hood.

Key highlights

  • Topic-scoped Redis vector indexes keep different paper collections isolated
  • Docker-first local setup with make docker-up; local Python 3.13 + Poetry path also available
  • Built-in stats page to inspect index metadata, fields, and query engine performance
  • Uses GPT-4.1-mini and text-embedding-3-small by default (configurable via .env)
  • Clean Makefile with format, test, build, dev, and docker commands

Caveats

  • Requires OpenAI API key; no local model fallback mentioned
  • “Planned follow-ups” include basic features like year/author filters and chat history, suggesting the current version is pretty bare-bones
  • No mention of rate limiting or cost controls for arXiv fetching or OpenAI calls

Verdict

Worth an hour if you’re learning RAG architecture and want to see Redis as a vector database in a complete, runnable pipeline. Skip it if you need a serious research tool or want to avoid OpenAI dependencies.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.