Is gptpdf open source?

Yes — CosmosShadow/gptpdf is open source, released under the MIT license.

What language is gptpdf written in?

CosmosShadow/gptpdf is primarily written in Python.

How popular is gptpdf?

CosmosShadow/gptpdf has 3.6k stars on GitHub.

Where can I find gptpdf?

CosmosShadow/gptpdf is on GitHub at https://github.com/CosmosShadow/gptpdf.

← all repositories

CosmosShadow/gptpdf

For $0.013 a page, GPT-4o reads your PDFs so you don't have to

It farms out the hard parts of PDF extraction—tables, math, layouts—to a vision model instead of brittle heuristics.

★3.6k stars Python Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does gptpdf converts PDFs into Markdown by rendering each page, marking non-text regions with bounding boxes via PyMuPDF, and asking a multimodal LLM like GPT-4o to transcribe the annotated result. The tool extracts both the structured text and any embedded images, returning a Markdown file plus a list of image paths. The authors claim it handles typography, math formulas, tables, pictures, and charts, and cite an average cost of about $0.013 per page.

The interesting bit The whole pipeline is just 293 lines of code. Rather than building a complex document engine, the project treats PDF parsing as a computer-vision task: point the model at the tricky bits and let it infer the layout.

Key highlights

Exposes a single function parse_pdf with tunable prompts (prompt, rect_prompt, role_prompt) to guide how the model interprets text and marked regions.
Supports any OpenAI-compatible vision endpoint, including GPT-4o, GLM-4V, Qwen-VL-Max, Yi-Vision, and Azure OpenAI.
Returns extracted images alongside Markdown output, keeping figures and diagrams separate from the text stream.
Ships with example conversions of real academic papers, such as Attention Is All You Need.

Caveats

Requires an external vision-LLM API; the README does not mention support for fully local or offline execution.
The “almost perfect” parsing claim is qualitative; no benchmarks or error rates are provided against traditional extractors.

Verdict Useful if you need readable Markdown from complex, layout-heavy PDFs and can justify per-page API spend. Look elsewhere if you need deterministic, offline extraction.

Frequently asked

What is CosmosShadow/gptpdf?: It farms out the hard parts of PDF extraction—tables, math, layouts—to a vision model instead of brittle heuristics.
Is gptpdf open source?: Yes — CosmosShadow/gptpdf is open source, released under the MIT license.
What language is gptpdf written in?: CosmosShadow/gptpdf is primarily written in Python.
How popular is gptpdf?: CosmosShadow/gptpdf has 3.6k stars on GitHub.
Where can I find gptpdf?: CosmosShadow/gptpdf is on GitHub at https://github.com/CosmosShadow/gptpdf.