This document parser is officially a ghost
LandingAI's Python wrapper for Agentic Document Extraction has been deprecated in favor of a new library, but the repo still holds 2,400 stars and a useful pattern for API client design.

What it does
agentic-doc is a Python client for LandingAI’s Agentic Document Extraction API. It turns visually complex documents—PDFs, images, URLs—into structured JSON and Markdown, handling the messy parts like splitting 1,000-page PDFs into parallel chunks, retrying on rate limits, and stitching results back together.
The interesting bit
The library treats “just call the API” as harder than it sounds. It auto-splits large documents against page limits, manages thread pools and exponential backoff for 408/429/502-504 errors, and even generates bounding-box visualizations so you can verify the model actually looked where it claims. That’s the kind of boring reliability that separates a demo from production code.
Key highlights
- Single
parse()function handles files, URLs, raw bytes, or connector configs (S3, Google Drive, local directories) - Pydantic models for typed field extraction with per-field confidence scores
- Configurable parallelism and retries via environment variables or
.envfiles—no code changes needed - Visual debugging tools: save grounding snippets as PNGs or generate full annotated page images
- Still actively maintained enough to have CI badges, though officially legacy
Caveats
- Deprecated: README opens with a deprecation warning pointing to
landingai-adefor new projects - Requires LandingAI API key; not a self-hosted or offline solution
- Python 3.9–3.12 only
Verdict
Worth studying if you’re building a similar API client wrapper—it’s a solid reference for handling pagination, retries, and batch parallelism. Don’t start new projects here; use landingai-ade instead. If you need offline document parsing, this was never the tool for you.