← all repositories

enoch3712/ExtractThinker

A Python library for extracting and classifying structured data from PDFs, images, and documents using LLMs with ORM-style document workflow abstractions.

1.6k stars Python Data ToolingRAG · Search
ExtractThinker
Velocity · 7d
+1.8
★ / day
Trend
steady
star history

ExtractThinker is a document intelligence library that leverages LLMs to extract and classify structured data from various document formats. It provides flexible document loaders supporting OCR engines like Tesseract, cloud services like AWS Textract and Google Document AI, and integrates with multiple LLM providers including OpenAI and Anthropic. Developers define custom extraction contracts using Pydantic models and can implement async processing with different splitting strategies for efficient large document handling.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.