← all repositories
landing-ai/agentic-doc

This document parser is officially a ghost

LandingAI's Python wrapper for Agentic Document Extraction has been deprecated in favor of a new library, but the repo still holds 2,400 stars and a useful pattern for API client design.

2.4k stars Python Data ToolingDomain Apps
agentic-doc
Velocity · 7d
+5.3
★ / day
Trend
steady
star history

What it does

agentic-doc is a Python client for LandingAI’s Agentic Document Extraction API. It turns visually complex documents—PDFs, images, URLs—into structured JSON and Markdown, handling the messy parts like splitting 1,000-page PDFs into parallel chunks, retrying on rate limits, and stitching results back together.

The interesting bit

The library treats “just call the API” as harder than it sounds. It auto-splits large documents against page limits, manages thread pools and exponential backoff for 408/429/502-504 errors, and even generates bounding-box visualizations so you can verify the model actually looked where it claims. That’s the kind of boring reliability that separates a demo from production code.

Key highlights

  • Single parse() function handles files, URLs, raw bytes, or connector configs (S3, Google Drive, local directories)
  • Pydantic models for typed field extraction with per-field confidence scores
  • Configurable parallelism and retries via environment variables or .env files—no code changes needed
  • Visual debugging tools: save grounding snippets as PNGs or generate full annotated page images
  • Still actively maintained enough to have CI badges, though officially legacy

Caveats

  • Deprecated: README opens with a deprecation warning pointing to landingai-ade for new projects
  • Requires LandingAI API key; not a self-hosted or offline solution
  • Python 3.9–3.12 only

Verdict

Worth studying if you’re building a similar API client wrapper—it’s a solid reference for handling pagination, retries, and batch parallelism. Don’t start new projects here; use landingai-ade instead. If you need offline document parsing, this was never the tool for you.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.