Is docstrange open source?

Yes — NanoNets/docstrange is open source, released under the MIT license.

What language is docstrange written in?

NanoNets/docstrange is primarily written in Python.

How popular is docstrange?

NanoNets/docstrange has 1.5k stars on GitHub.

Where can I find docstrange?

NanoNets/docstrange is on GitHub at https://github.com/NanoNets/docstrange.

← all repositories

NanoNets/docstrange

A document parser that actually runs offline

DocStrange converts PDFs, scans, and Office files into structured Markdown or JSON—locally, if you want.

★1.5k stars Python Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does DocStrange is a Python library that ingests PDFs, Word docs, PowerPoints, Excel sheets, images, and even URLs, then spits them out as Markdown, JSON, CSV, or HTML. It handles OCR on scanned documents and photos, extracts tables into clean formatting, and can target specific fields or conform to a custom JSON schema. There’s also a built-in local web UI for drag-and-drop conversion.

The interesting bit The dual-mode architecture is the real hook: cloud processing is the default (free up to 10,000 documents per month), but flip gpu=True and the entire pipeline—OCR, layout detection, and a 7B-parameter model—runs 100% locally on your own hardware. No data leaves the machine. That’s increasingly rare in the “AI document processing” space where most tools are API-shaped black boxes.

Key highlights

Supports PDF, DOCX, PPTX, XLSX, images, and URLs as inputs
Outputs LLM-optimized Markdown, structured JSON with schema support, HTML, and CSV
Local mode requires CUDA for GPU acceleration; CPU fallback is mentioned but not detailed
Built-in web interface runs on localhost:8000 with pip install "docstrange[web]"
MCP server integration for Claude Desktop document navigation
Models download automatically on first local run

Caveats

The README claims “works on GPU or CPU when running locally” but the local processing section only documents gpu=True and notes CUDA is required; CPU behavior is unclear
Cloud mode is default, so privacy requires explicit opt-in to local mode
“7B model” is referenced but not named or characterized beyond parameter count

Verdict Worth a look if you’re building RAG pipelines or data extraction workflows and need an escape hatch from cloud-only APIs. Skip it if you need transparent model provenance or guaranteed CPU-only local operation.

Frequently asked

What is NanoNets/docstrange?: DocStrange converts PDFs, scans, and Office files into structured Markdown or JSON—locally, if you want.
Is docstrange open source?: Yes — NanoNets/docstrange is open source, released under the MIT license.
What language is docstrange written in?: NanoNets/docstrange is primarily written in Python.
How popular is docstrange?: NanoNets/docstrange has 1.5k stars on GitHub.
Where can I find docstrange?: NanoNets/docstrange is on GitHub at https://github.com/NanoNets/docstrange.