← all repositories
sml2h3/ddddocr

A Python library that solves CAPTCHAs so you don't have to

Offline OCR SDK trained on synthetic data to crack text, slider, and detection-based CAPTCHAs with minimal dependencies.

14.2k stars Python Computer Vision
ddddocr
Velocity · 7d
+7.9
★ / day
Trend
steady
star history

What it does

DdddOcr is an offline Python SDK for recognizing CAPTCHAs without calling external APIs. It handles text-based CAPTCHAs (including Chinese and special characters), slider-gap detection, and general object detection in verification images. Install via pip install ddddocr, initialize once, and feed it raw image bytes.

The interesting bit

The project trains on “large-scale randomly generated data” rather than collecting real CAPTCHAs, which sidesteps the usual dataset bottleneck. It also bundles multiple ONNX models and switches between them via boolean flags like beta=True or det=True — a slightly quirky parameter system where some flags silently override others.

Key highlights

  • Runs entirely offline; no API keys or network calls
  • Supports GPU acceleration via ONNX Runtime (CUDA required)
  • Includes two built-in OCR models (common_old.onnx default, common.onnx via beta=True)
  • Custom model import via import_onnx_path + charsets_path for niche CAPTCHA types
  • Slider CAPTCHA solving with two algorithms: edge matching and image difference comparison
  • Cross-platform: Windows 64, Linux 64/ARM64, macOS x64 (M-series chips need extra setup)

Caveats

  • 32-bit Windows and Linux are explicitly unsupported
  • The old=True compatibility flag currently does nothing; only beta=True actually swaps models
  • Initialization is slow; docs warn against creating a new instance per image
  • show_ad=True displays sponsor ads on init — set to False for production

Verdict

Worth a look if you’re automating against legacy CAPTCHA systems and want to avoid paid API services. Skip it if you’re dealing with reCAPTCHA v3, hCaptcha, or other modern behavioral challenges — the docs themselves point to commercial services for those.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.