Is dr-doc-search open source?

Yes — namuan/dr-doc-search is open source, released under the MIT license.

What language is dr-doc-search written in?

namuan/dr-doc-search is primarily written in Python.

How popular is dr-doc-search?

namuan/dr-doc-search has 598 stars on GitHub.

Where can I find dr-doc-search?

namuan/dr-doc-search is on GitHub at https://github.com/namuan/dr-doc-search.

← all repositories

namuan/dr-doc-search

Chat with your PDFs, but bring your own OCR

A CLI tool that turns static PDFs into conversational search targets using GPT-3 or local HuggingFace models.

★598 stars Python RAG · Search Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Dr-doc-search ingests a PDF, rips it into page images, runs Tesseract OCR over them, builds a vector index, and exposes either a CLI Q&A mode or a local web UI (port 5006) where you can ask natural-language questions about the document’s contents. It started as an OpenAI-only tool; since v1.5.0 you can swap in HuggingFace embeddings and LLMs to keep your documents and your money local.

The interesting bit

The pipeline is deliberately low-tech: PDF → image → OCR → text chunks → embeddings. That makes it work on scanned books and image-heavy PDFs where pure text extraction fails, though it also means you’re one ImageMagick install away from dependency hell. The web UI is built with HoloViz Panel, which is an unusual but pragmatic choice for a solo dev tool.

Key highlights

Supports both OpenAI (GPT-3) and local HuggingFace models for embeddings and answers
Web interface and CLI modes; page-range filtering for large documents
Outputs working files (images, OCR text, index) to ~/OutputDir/dr-doc-search/<pdf-name> for inspection or debugging
PyPI installable; automated release pipeline via Poetry and GitHub Actions

Caveats

Requires manual installation of Tesseract OCR and ImageMagick; Windows users must set an IMCONV environment variable
The README notes OpenAI API costs apply after trial period, but doesn’t quantify typical indexing or query costs
No mention of concurrent users, rate limiting, or how the web UI behaves with large documents

Verdict

Worth a spin if you have a shelf of scanned PDFs and want to query them without uploading to a cloud service—provided you’re willing to wrangle OCR dependencies. Skip it if your PDFs are already text-native; simpler tools exist for that.

Frequently asked

What is namuan/dr-doc-search?: A CLI tool that turns static PDFs into conversational search targets using GPT-3 or local HuggingFace models.
Is dr-doc-search open source?: Yes — namuan/dr-doc-search is open source, released under the MIT license.
What language is dr-doc-search written in?: namuan/dr-doc-search is primarily written in Python.
How popular is dr-doc-search?: namuan/dr-doc-search has 598 stars on GitHub.
Where can I find dr-doc-search?: namuan/dr-doc-search is on GitHub at https://github.com/namuan/dr-doc-search.