Is zerox open source?

Yes — getomni-ai/zerox is open source, released under the MIT license.

What language is zerox written in?

getomni-ai/zerox is primarily written in TypeScript.

How popular is zerox?

getomni-ai/zerox has 12.2k stars on GitHub.

Where can I find zerox?

getomni-ai/zerox is on GitHub at https://github.com/getomni-ai/zerox.

← all repositories

getomni-ai/zerox

OCR by asking GPT-4o to read screenshots of your PDF

Zerox turns documents into images and feeds them to vision models, because tables and weird layouts break traditional text extractors.

★12.2k stars TypeScript Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Zerox converts PDFs, DOCX files, and images into a sequence of page images, then asks a vision model (GPT-4o, Claude 3, Gemini, etc.) to transcribe each one into Markdown. It aggregates the results and hands back structured text with optional per-page metadata like token counts and completion time. There’s also a JSON Schema extraction mode if you need structured data instead of raw Markdown.

The interesting bit The maintainFormat option chains pages sequentially: page 1’s Markdown gets fed into the prompt for page 2, which helps preserve tables that span page breaks. The trade-off is speed — concurrency drops to 1. It’s a blunt but effective hack for a genuinely hard layout problem.

Key highlights

Supports OpenAI, Azure, AWS Bedrock, and Google Gemini with provider-specific credential passing
Node and Python SDKs, though feature parity is uneven (Node has schema extraction and orientation correction; Python has custom system prompts and Vertex AI)
Optional per-page structured extraction with a separate model/provider from the OCR step
Configurable concurrency, DPI, image compression, and temp directory management
Requires system dependencies: graphicsmagick/ghostscript for Node, poppler for Python

Caveats

The Node and Python versions diverge significantly; check the feature matrix before picking one
No pricing or latency benchmarks are provided, and token counts from the example suggest multi-page documents could get expensive quickly
README mentions Tesseract workers in the API but doesn’t explain when they activate vs. vision-model fallback

Verdict Worth a look if you’re already paying for vision-model API access and your documents have complex layouts that defeat conventional OCR. Skip it if you need deterministic, offline, or low-cost extraction — this is fundamentally a cloud-API wrapper with image preprocessing glue.

Frequently asked

What is getomni-ai/zerox?: Zerox turns documents into images and feeds them to vision models, because tables and weird layouts break traditional text extractors.
Is zerox open source?: Yes — getomni-ai/zerox is open source, released under the MIT license.
What language is zerox written in?: getomni-ai/zerox is primarily written in TypeScript.
How popular is zerox?: getomni-ai/zerox has 12.2k stars on GitHub.
Where can I find zerox?: getomni-ai/zerox is on GitHub at https://github.com/getomni-ai/zerox.