Is unstract open source?

Yes — Zipstack/unstract is open source, released under the AGPL-3.0 license.

What language is unstract written in?

Zipstack/unstract is primarily written in Python.

How popular is unstract?

Zipstack/unstract has 6.7k stars on GitHub.

Where can I find unstract?

Zipstack/unstract is on GitHub at https://github.com/Zipstack/unstract.

← all repositories

Zipstack/unstract

Your PDFs have structure, you just need to ask nicely

Unstract turns document extraction into a prompt-and-deploy workflow instead of a regex archaeology dig.

★6.7k stars Python Domain Apps Data Tooling Inference · Serving

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Unstract is a self-hostable platform that feeds documents — PDFs, scans, spreadsheets, images — to LLMs and returns structured JSON. You describe what you want in natural language via a “Prompt Studio,” then expose the result as a REST API, an ETL pipeline, or an n8n node. The stack is familiar: React frontend, Django backend, Celery workers, PostgreSQL, Redis, RabbitMQ, all wrapped in Docker Compose.

The interesting bit The bet here is that prompt engineering replaces template engineering. Rather than maintaining brittle regexes per vendor or document type, you write a schema description once and let the LLM handle layout variations. The README’s “Current State vs. Unstract” table is unusually honest about this trade-off — it knows you’re currently suffering through “regex, build templates per vendor.”

Key highlights

Broad format support: PDF, DOCX, XLSX, PPTX, and common image formats
Pluggable LLM providers: OpenAI, Anthropic, Bedrock, Gemini, Ollama, Mistral, plus “OpenAI Compatible” catch-all
Vector DB adapters: Qdrant, Pinecone, Weaviate, Milvus, PostgreSQL
ETL sources and destinations include S3, GCS, Azure Blob, Snowflake, BigQuery, Redshift, and major SQL databases
MCP server for agent integration (Claude, etc.) and an n8n custom node
One-script local deploy: ./run-platform.sh with default credentials unstract / unstract

Caveats

Requires 8 GB RAM minimum and Docker; not a lightweight sidecar
The encryption key warning is worth heeding: lose ENCRYPTION_KEY and your adapter credentials are gone
Enterprise features (dual-LLM verification, human-in-the-loop, SSO) are cloud-only; the open-source build is the extraction engine without the guardrails

Verdict Worth a spin if you’re currently maintaining a graveyard of per-vendor document parsers and want to consolidate on LLM prompts. Skip it if your documents are already clean, your volumes are tiny, or you treat 8 GB RAM as extravagant.

Frequently asked

What is Zipstack/unstract?: Unstract turns document extraction into a prompt-and-deploy workflow instead of a regex archaeology dig.
Is unstract open source?: Yes — Zipstack/unstract is open source, released under the AGPL-3.0 license.
What language is unstract written in?: Zipstack/unstract is primarily written in Python.
How popular is unstract?: Zipstack/unstract has 6.7k stars on GitHub.
Where can I find unstract?: Zipstack/unstract is on GitHub at https://github.com/Zipstack/unstract.