allenai/papermage
Python library for parsing, representing, and manipulating scientific papers combining NLP and computer vision techniques.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
Papermage is a research toolkit for extracting and processing structured content from scientific PDFs. It provides recipes and layer-based document representations that segment papers into symbols, pages, rows, and other structured elements. The library combines NLP and computer vision approaches to handle the multimodal nature of scientific documents containing text, figures, tables, and layouts.