← all repositories
TimmyOVO/deepseek-ocr.rs

Python-free OCR stack that speaks OpenAI

A Rust rewrite of DeepSeek-OCR with three vision backends, DSQ quantization, and a drop-in HTTP server—no conda required.

deepseek-ocr.rs
Velocity · 7d
+9.6
★ / day
Trend
steady
star history

What it does

This is a Rust workspace that runs document OCR and visual-language inference locally using Candle, with three model backends to choose from. You get a CLI for batch jobs and an HTTP server that exposes /v1/chat/completions and /v1/responses, so OpenAI SDKs and tools like Open WebUI connect without adapters. It downloads weights automatically from Hugging Face or ModelScope on first run.

The interesting bit

The project is essentially a Rust-native port of a Python + Transformers pipeline, but the rewrite buys you more than just memory safety. Prompt token construction runs ~97× faster than the reference Python stack, and the server automatically collapses chat history to a single turn so OCR outputs stay deterministic even when chatty clients send full conversation context.

Key highlights

  • Three backends with clear trade-offs: DeepSeek-OCR (~13GB RAM, highest accuracy), PaddleOCR-VL (~9GB, lighter and faster), and DotsOCR (30–50GB for high-res layout/reading-order tasks).
  • DSQ-quantized variants (Q4_K through Q8_0) for each backend to shrink weight memory.
  • Apple Metal and x86 MKL support are first-class; NVIDIA CUDA is available but marked alpha.
  • Shared config.toml keeps CLI and server in sync; runtime overrides resolve cleanly from flags → config → defaults → request payload.
  • Pre-built macOS (Metal) and Windows binaries ship via GitHub Actions artifacts.

Caveats

  • CUDA support is explicitly alpha: “expect rough edges while we finish kernel coverage.”
  • DotsOCR’s vision tower is heavy; the README warns it can fall to ~12 tok/s on CPU and demands 30–50GB RAM/VRAM for high-resolution documents.
  • Debug builds are “extremely slow”; you must compile --release for usable throughput.

Verdict

Worth a look if you want local OCR without dragging in Python, conda, or the GIL—especially on Apple Silicon. Skip it if you need production-grade CUDA today or if your hardware can’t stomach the larger models’ memory appetite.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.