raphael-seo/Versatile-OCR-Program
Multi-modal OCR pipeline for extracting text, figures, math, tables, and diagrams from documents, designed for ML training data preparation.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
This project provides an OCR system optimized for machine learning workflows, extracting structured content from documents including text, figures, math equations, tables, and diagrams. It uses a modular, config-driven architecture and can leverage OpenAI models as part of its pipeline. The system is designed to produce clean, ML-ready datasets from academic papers, exams, and educational materials.