CosmosShadow/gptpdf
A Python tool that leverages GPT-4o's visual capabilities to convert PDF documents into markdown format with support for complex layout elements.

The project parses PDFs by first using PyMuPDF to identify non-text areas in documents, then feeding those regions to a large visual model (GPT-4o) to extract content and convert it to markdown. It handles typography, mathematical formulas, tables, images, and charts with reportedly low cost (~$0.013 per page). The tool is implemented in about 293 lines of code and depends on a GeneralAgent library for OpenAI API interaction.