← all repositories

yobix-ai/extractous

A high-performance Rust library for extracting text, metadata, and structure from unstructured documents like PDFs and Word files.

1.8k stars Rust Data Tooling
extractous
Velocity · 7d
+2.4
★ / day
Trend
steady
star history

Extractous is a document content extraction library written in Rust that processes PDFs, Word documents, HTML, and other formats. It provides language bindings for Python, Node.js, Go, and other languages. The project explicitly positions itself as infrastructure for RAG and LLM workflows, claiming 25x faster performance than the unstructured-io library commonly used in AI document processing pipelines. It includes OCR capabilities and is designed to feed extracted content into machine learning and NLP systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.