DeepSeek-OCR wrapped in Docker so you don't have to wrestle it
A practical packaging job around DeepSeek's vision model, turning PDF chaos into structured Markdown with a FastAPI shim and batch scripts.

What it does
This repo wraps DeepSeek-OCR in a Dockerized FastAPI service and provides five Python scripts that scan a data/ folder, feed PDFs to the model, and spit out Markdown (or raw OCR text). You get a REST API for live requests, or drop files and run batch scripts locally. The scripts differ mainly in post-processing: basic conversion, image extraction, custom prompts from a YAML file, or plain text extraction.
The interesting bit
The real work here isn’t the model—it’s the duct tape. The README explicitly notes this project patches a bug in the upstream DeepSeek-OCR library where tokenize_with_images() fails on startup because the prompt parameter is missing. The Docker build transparently swaps in fixed files. That’s a service to anyone who’s tried to run the original and hit an opaque initialization crash.
Key highlights
- FastAPI backend with
/ocr/image,/ocr/pdf, and/ocr/batchendpoints - Five processor scripts with a clear suffix convention (
-MD.md,-OCR.md,-CUSTOM.md) so you can compare outputs side-by-side - Enhanced variants extract images to
data/images/and clean up special tokens - Custom prompt support via
custom_prompt.yamlfor experimenting with extraction instructions - Includes copy-paste Python and JavaScript client examples
Caveats
- Hardware appetite is serious: 12GB VRAM minimum, 32GB RAM (64GB+ recommended), 50GB storage
- Requires NVIDIA GPU with CUDA 11.8+, Docker with GPU support, and the NVIDIA Container Toolkit—this won’t run on your laptop’s integrated graphics
- The README is truncated mid-sentence in the “Custom Files” section, so the full list of patches isn’t visible
Verdict
Worth a look if you need DeepSeek-OCR running reliably without debugging its Python internals. Skip it if you were hoping for a lightweight or CPU-friendly solution—this is a workstation-grade deployment.