← all repositories
lukas-blecher/LaTeX-OCR

Screenshot an equation, get LaTeX back

A Vision Transformer that reads math formulas from images and spits out typeset-ready code.

16.4k stars Python Computer Vision
LaTeX-OCR
Velocity · 7d
+8.2
★ / day
Trend
steady
star history

What it does

pix2tex takes an image of a mathematical formula—screenshot, photo, or file—and returns the corresponding LaTeX code. It runs as a CLI tool, a GUI snipping utility, a Python API, or a Dockerized Streamlit service. The model checkpoints download automatically on first use.

The interesting bit

The preprocessing step is the quiet hero: a secondary neural network predicts the optimal resolution for your input image, resizing it to match the training distribution. The README is admirably honest that this isn’t magic—“don’t zoom in all the way before taking a picture”—and suggests retrying at different resolutions if the first prediction looks off.

Key highlights

  • Encoder-decoder architecture: ViT with ResNet backbone feeding a Transformer decoder
  • Token accuracy of 0.60 and BLEU score of 0.88 on the benchmark dataset
  • GUI supports Linux screenshot tools across X11 and Wayland (with manual SCREENSHOT_TOOL override for compositor compatibility)
  • Training pipeline included, with data generation via XeLaTeX and KaTeX normalization
  • Handwritten formula support marked as “kinda done” in the training notebook

Caveats

  • Token accuracy at 0.60 means roughly four in ten tokens are wrong; the README explicitly warns to “always double check the result carefully”
  • Beam search, model distillation, and proper tracing are all on the un-checked TODO list
  • Dataset class “needs further improving” per the author’s own note

Verdict

Worth a look if you regularly transcribe equations from papers or slides and can tolerate proofreading the output. Not yet a drop-in replacement for manual typing if precision matters.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.