grobidOrg/grobid
A machine learning library that parses scientific PDF documents into structured XML/TEI format.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
GROBID uses neural networks and sequence labeling models (CRF) to automatically extract bibliographic metadata, header information, references, and fulltext content from scientific PDFs. It processes documents through trained deep learning models and outputs structured TEI-encoded XML suitable for academic databases and research pipelines. The tool exposes REST APIs and command-line interfaces for batch processing or integration into larger document workflows.