Chat with your PDFs, but bring your own OCR
A CLI tool that turns static PDFs into conversational search targets using GPT-3 or local HuggingFace models.

What it does
Dr-doc-search ingests a PDF, rips it into page images, runs Tesseract OCR over them, builds a vector index, and exposes either a CLI Q&A mode or a local web UI (port 5006) where you can ask natural-language questions about the document’s contents. It started as an OpenAI-only tool; since v1.5.0 you can swap in HuggingFace embeddings and LLMs to keep your documents and your money local.
The interesting bit
The pipeline is deliberately low-tech: PDF → image → OCR → text chunks → embeddings. That makes it work on scanned books and image-heavy PDFs where pure text extraction fails, though it also means you’re one ImageMagick install away from dependency hell. The web UI is built with HoloViz Panel, which is an unusual but pragmatic choice for a solo dev tool.
Key highlights
- Supports both OpenAI (GPT-3) and local HuggingFace models for embeddings and answers
- Web interface and CLI modes; page-range filtering for large documents
- Outputs working files (images, OCR text, index) to
~/OutputDir/dr-doc-search/<pdf-name>for inspection or debugging - PyPI installable; automated release pipeline via Poetry and GitHub Actions
Caveats
- Requires manual installation of Tesseract OCR and ImageMagick; Windows users must set an
IMCONVenvironment variable - The README notes OpenAI API costs apply after trial period, but doesn’t quantify typical indexing or query costs
- No mention of concurrent users, rate limiting, or how the web UI behaves with large documents
Verdict
Worth a spin if you have a shelf of scanned PDFs and want to query them without uploading to a cloud service—provided you’re willing to wrangle OCR dependencies. Skip it if your PDFs are already text-native; simpler tools exist for that.