RAG that actually reads the diagrams
Morphik is an end-to-end document store and search system built for visually rich, multimodal documents—because your PDFs contain more than text.

What it does Morphik ingests, stores, and searches unstructured documents—PDFs, images, videos—through a single Python SDK or REST API. It handles the full pipeline from ingestion to query, including metadata extraction and integrations with Google Suite, Slack, and Confluence. You can self-host via Docker or use their managed cloud tier.
The interesting bit The project leans on ColPali for multimodal search, meaning it indexes visual content directly rather than flattening diagrams and charts into broken text fragments. The README is refreshingly blunt about why traditional RAG fails: “Charts become meaningless text fragments. Critical diagrams lose their spatial relationships.” Morphik treats the visual layer as first-class data, not an afterthought.
Key highlights
- Multimodal search via ColPali across images, PDFs, and video through one endpoint
- Rules-based metadata extraction with bounding boxes and classification
- MCP (Model Context Protocol) support for LLM tool integration
- Business Source License 1.1: free for personal/indie use, free commercial use below $2K/month revenue, Apache 2.0 after four years per release
- Self-hosted deployments available but explicitly “not fully supported” by the team
Caveats
- The project pushes hard toward the managed cloud service; self-hosters get installation guides and Discord, but the README warns of limited support resources
- A June 2025 auth migration is required for existing self-hosted installs to avoid performance degradation
Verdict Worth evaluating if you’re building RAG over technical documentation, manuals, or any visually dense corpus where diagram understanding matters. Skip it if you need a fully community-supported, pure-open-source vector database without commercial nudges.