← all repositories

allenai/papermage

Python library for parsing, representing, and manipulating scientific papers combining NLP and computer vision techniques.

papermage
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Papermage is a research toolkit for extracting and processing structured content from scientific PDFs. It provides recipes and layer-based document representations that segment papers into symbols, pages, rows, and other structured elements. The library combines NLP and computer vision approaches to handle the multimodal nature of scientific documents containing text, figures, tables, and layouts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.