Is nougat open source?

Yes — facebookresearch/nougat is open source, released under the MIT license.

What language is nougat written in?

facebookresearch/nougat is primarily written in Python.

How popular is nougat?

facebookresearch/nougat has 10k stars on GitHub.

Where can I find nougat?

facebookresearch/nougat is on GitHub at https://github.com/facebookresearch/nougat.

← all repositories

facebookresearch/nougat

Meta's PDF parser that actually reads the math

Nougat turns academic PDFs into structured markdown, including LaTeX equations and tables, using a vision transformer trained on arXiv papers.

★10k stars Python Computer Vision Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Nougat is a neural PDF-to-markdown converter built specifically for academic documents. Feed it a paper and it spits out .mmd files—lightweight markup with LaTeX math and tables intact. It runs via CLI, Python API, or a local HTTP server on port 8503. Two model sizes exist: 0.1.0-small (default) and 0.1.0-base.

The interesting bit

Instead of treating PDFs as text extraction problems, Nougat treats them as vision problems. It builds on the Donut architecture—pure image-to-sequence, no traditional OCR pipeline. The model learned on arXiv and PubMed Central papers, so it understands two-column layouts, inline math, and the general chaos of TeX-generated PDFs.

Key highlights

Outputs Mathpix-compatible markdown with LaTeX tables and equations preserved
CLI supports batch processing, page ranges (-p 1-4,7), and directory inputs
Optional API mode (nougat_api) for HTTP POST requests with start/stop page parameters
Training and fine-tuning pipeline included via train.py and YAML configs
Dataset generation tools provided, though they require LaTeXML, pdffigures2, and non-trivial setup

Caveats

English or Latin-based languages only; Chinese, Russian, Japanese, etc. will not work
Failure detection heuristic misfires on some CPUs/GPUs, producing [MISSING_PAGE]—use --no-skipping if this happens
Model weights are CC-BY-NC (non-commercial), while the code is MIT

Verdict

Researchers building RAG pipelines, citation tools, or anything that needs structured text from PDFs should try this. If your documents aren’t academic papers or you need commercial use of the weights, look elsewhere.

Frequently asked

What is facebookresearch/nougat?: Nougat turns academic PDFs into structured markdown, including LaTeX equations and tables, using a vision transformer trained on arXiv papers.
Is nougat open source?: Yes — facebookresearch/nougat is open source, released under the MIT license.
What language is nougat written in?: facebookresearch/nougat is primarily written in Python.
How popular is nougat?: facebookresearch/nougat has 10k stars on GitHub.
Where can I find nougat?: facebookresearch/nougat is on GitHub at https://github.com/facebookresearch/nougat.