← all repositories

pymupdf/pymupdf4llm

A PDF processing library for extracting text and images for use with Large Language Models.

1.8k stars Python Data Tooling
pymupdf4llm
Velocity · 7d
+2.2
★ / day
Trend
steady
star history

PyMuPDF4LLM is a specialized fork of the PyMuPDF library designed to facilitate PDF content extraction for LLM applications. It provides utilities to extract text, tables, and images from PDF documents in formats optimized for consumption by language models. The library integrates with the broader PyMuPDF ecosystem while adding capabilities specifically useful for AI data preparation pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.