← all repositories
facebookresearch/nougat

Meta's PDF parser that actually reads the math

Nougat turns academic PDFs into structured markdown, including LaTeX equations and tables, using a vision transformer trained on arXiv papers.

10k stars Python Computer VisionData Tooling
nougat
Velocity · 7d
+9.1
★ / day
Trend
steady
star history

What it does

Nougat is a neural PDF-to-markdown converter built specifically for academic documents. Feed it a paper and it spits out .mmd files—lightweight markup with LaTeX math and tables intact. It runs via CLI, Python API, or a local HTTP server on port 8503. Two model sizes exist: 0.1.0-small (default) and 0.1.0-base.

The interesting bit

Instead of treating PDFs as text extraction problems, Nougat treats them as vision problems. It builds on the Donut architecture—pure image-to-sequence, no traditional OCR pipeline. The model learned on arXiv and PubMed Central papers, so it understands two-column layouts, inline math, and the general chaos of TeX-generated PDFs.

Key highlights

  • Outputs Mathpix-compatible markdown with LaTeX tables and equations preserved
  • CLI supports batch processing, page ranges (-p 1-4,7), and directory inputs
  • Optional API mode (nougat_api) for HTTP POST requests with start/stop page parameters
  • Training and fine-tuning pipeline included via train.py and YAML configs
  • Dataset generation tools provided, though they require LaTeXML, pdffigures2, and non-trivial setup

Caveats

  • English or Latin-based languages only; Chinese, Russian, Japanese, etc. will not work
  • Failure detection heuristic misfires on some CPUs/GPUs, producing [MISSING_PAGE]—use --no-skipping if this happens
  • Model weights are CC-BY-NC (non-commercial), while the code is MIT

Verdict

Researchers building RAG pipelines, citation tools, or anything that needs structured text from PDFs should try this. If your documents aren’t academic papers or you need commercial use of the weights, look elsewhere.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.