← all repositories
Bogdanovich77/DeekSeek-OCR---Dockerized-API

DeepSeek-OCR wrapped in Docker so you don't have to wrestle it

A practical packaging job around DeepSeek's vision model, turning PDF chaos into structured Markdown with a FastAPI shim and batch scripts.

1.1k stars Python Computer VisionData Tooling
DeekSeek-OCR---Dockerized-API
Velocity · 7d
+4.8
★ / day
Trend
steady
star history

What it does

This repo wraps DeepSeek-OCR in a Dockerized FastAPI service and provides five Python scripts that scan a data/ folder, feed PDFs to the model, and spit out Markdown (or raw OCR text). You get a REST API for live requests, or drop files and run batch scripts locally. The scripts differ mainly in post-processing: basic conversion, image extraction, custom prompts from a YAML file, or plain text extraction.

The interesting bit

The real work here isn’t the model—it’s the duct tape. The README explicitly notes this project patches a bug in the upstream DeepSeek-OCR library where tokenize_with_images() fails on startup because the prompt parameter is missing. The Docker build transparently swaps in fixed files. That’s a service to anyone who’s tried to run the original and hit an opaque initialization crash.

Key highlights

  • FastAPI backend with /ocr/image, /ocr/pdf, and /ocr/batch endpoints
  • Five processor scripts with a clear suffix convention (-MD.md, -OCR.md, -CUSTOM.md) so you can compare outputs side-by-side
  • Enhanced variants extract images to data/images/ and clean up special tokens
  • Custom prompt support via custom_prompt.yaml for experimenting with extraction instructions
  • Includes copy-paste Python and JavaScript client examples

Caveats

  • Hardware appetite is serious: 12GB VRAM minimum, 32GB RAM (64GB+ recommended), 50GB storage
  • Requires NVIDIA GPU with CUDA 11.8+, Docker with GPU support, and the NVIDIA Container Toolkit—this won’t run on your laptop’s integrated graphics
  • The README is truncated mid-sentence in the “Custom Files” section, so the full list of patches isn’t visible

Verdict

Worth a look if you need DeepSeek-OCR running reliably without debugging its Python internals. Skip it if you were hoping for a lightweight or CPU-friendly solution—this is a workstation-grade deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.