← all repositories

wisupai/e2m

A Python library that converts various document formats (PDF, DOCX, EPUB, audio) into Markdown specifically for AI data pipelines.

1.3k stars Jupyter Notebook Data ToolingRAG · Search
e2m
Velocity · 7d
+1.9
★ / day
Trend
steady
star history

E2M is a document conversion library using a parser-converter architecture to transform files like PDFs, Word documents, audio, and web pages into clean Markdown. The project explicitly targets Retrieval-Augmented Generation systems and model fine-tuning as its end use case. It provides dedicated parsers for different file types and supports custom configurations to ensure high-quality text extraction suitable for LLM consumption.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.