← all repositories
neuml/codequestion

Stack Overflow in your terminal, no Wi-Fi required

A local semantic search engine that lets developers query Stack Exchange dumps without opening a browser—or needing a network connection.

542 stars Python RAG · SearchOther AI
codequestion
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

codequestion is a Python CLI tool that downloads a pre-built semantic index of Stack Exchange questions and answers, then runs entirely offline. You type a coding question in natural language; it returns similar questions with scores, and can surface metadata like tags, dates, and accepted answers via a local SQLite database. It also exposes a standard txtai API if you want to query it over HTTP.

The interesting bit

The project is essentially a polished packaging job around txtai and sentence-transformers, but the packaging matters: it curates only highly-scored questions with accepted answers from 23 Stack Exchange sites, vectorizes them with all-MiniLM-L6-v2, and stores the result in a Faiss index. The latest release adds semantic graphs for topic modeling and path traversal—handy for seeing how two seemingly unrelated questions connect through intermediate concepts.

Key highlights

  • Runs fully offline after a one-time model download (~stored in ~/.codequestion/)
  • Ships with a VS Code integration: open an integrated terminal and type codequestion
  • Supports SQL-like queries over the txtai API to pull back specific metadata fields
  • MRR of 85.0 on Stack Exchange queries, up from 77.1 in the word-vectors/BM25 era
  • Build pipeline is documented and reproducible if you want to index your own Stack Exchange dumps

Caveats

  • The default model requires downloading a specific set of 23 Stack Exchange 7z dumps; the ETL process is rigid about directory structure
  • The word-vectors path is effectively deprecated; the README still documents it but notes it’s “only necessary if using a word vectors model”
  • No mention of update cadence for the pre-built model; if Stack Exchange data ages, so do your answers

Verdict

Worth a look if you work offline often, burn cycles on repetitive Stack Overflow searches, or want a concrete example of how to ship a txtai application. Skip it if you need real-time answers (the index is static) or prefer your search results with live comment threads and community vetting.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.