← all repositories
HKUDS/RAG-Anything

When your PDF has charts, tables, and math, plain RAG chokes

RAG-Anything extends LightRAG to ingest and query documents that mix text, images, equations, and tables without splitting the work across half a dozen tools.

RAG-Anything
Velocity · 7d
+57
★ / day
Trend
steady
star history

What it does

RAG-Anything is a multimodal retrieval system built atop LightRAG. It parses PDFs, Office files, and images into structured chunks—text blocks, visuals, tables, and math—then indexes them into a knowledge graph so you can query everything through one interface. The project also supports injecting pre-parsed content lists directly, skipping the parser when you already have clean data.

The interesting bit

The framework routes each content type through its own pipeline concurrently, then ties results together with cross-modal relationships in a shared knowledge graph. That means a question about a chart can pull in the surrounding paragraph and the table on the next page as joint context, rather than treating them as isolated documents.

Key highlights

  • Built on LightRAG; inherits its graph-based retrieval and adds multimodal layers on top
  • Uses MinerU for layout-aware document parsing and structure extraction
  • Supports VLM-enhanced queries: images from documents get fed into a vision-language model alongside text for joint reasoning
  • Handles PDFs, Word, PowerPoint, Excel, and images through format-specific parsers
  • Published technical report on arXiv (2510.12323); ~21k GitHub stars at time of writing

Caveats

  • The README is heavy on gradient backgrounds and emoji, light on concrete benchmarks or latency numbers
  • “Concurrent multi-pipeline architecture” is claimed but no throughput metrics or hardware requirements are listed
  • The framework is young enough that real-world stress testing at scale is not yet visible in the docs

Verdict

Worth a look if you manage technical documentation, financial reports, or research papers where text and visuals are tightly coupled. Skip it if your corpus is already clean, unimodal text—vanilla LightRAG or a simpler vector store will be less moving parts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.