explosion/spacy-layout
A spaCy plugin that converts PDFs and Word documents into structured data and spaCy Doc objects for downstream NLP and RAG processing.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
This plugin integrates with Docling to extract structured data from PDFs, Word documents, and other formats. It creates spaCy Doc objects with labelled text spans (sections, headings) and tables converted to pandas DataFrames. The resulting structured output enables linguistic analysis, named entity recognition, text classification, and chunking for RAG pipelines.