pymupdf/pymupdf4llm
A PDF processing library for extracting text and images for use with Large Language Models.

Velocity · 7d
+2.2
★ / day
Trend
→steady
star history
PyMuPDF4LLM is a specialized fork of the PyMuPDF library designed to facilitate PDF content extraction for LLM applications. It provides utilities to extract text, tables, and images from PDF documents in formats optimized for consumption by language models. The library integrates with the broader PyMuPDF ecosystem while adding capabilities specifically useful for AI data preparation pipelines.