wisupai/e2m
A Python library that converts various document formats (PDF, DOCX, EPUB, audio) into Markdown specifically for AI data pipelines.

Velocity · 7d
+1.9
★ / day
Trend
→steady
star history
E2M is a document conversion library using a parser-converter architecture to transform files like PDFs, Word documents, audio, and web pages into clean Markdown. The project explicitly targets Retrieval-Augmented Generation systems and model fine-tuning as its end use case. It provides dedicated parsers for different file types and supports custom configurations to ensure high-quality text extraction suitable for LLM consumption.