← all repositories
zacharywhitley/awesome-ocr

A thousand stars for a list of OCR lists

A curated index of 100+ tools for turning images into text, from deskewing scanned books to spotting text in the wild.

awesome-ocr
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This is an awesome-list repository that catalogs open-source projects across the entire OCR pipeline: deskewing and dewarping scanned documents, segmenting pages into lines/words/regions, detecting tables, recognizing handwritten text, and localizing text in natural scene images. Each entry links to a GitHub repo and often includes the corresponding research paper.

The interesting bit

The list exposes how fragmented OCR really is. There is no single “just use this” library; instead you chain specialists—maybe page_dewarp for curved book pages, LayoutParser for structure, then vedastr or SimpleHTR for actual recognition. The maintainer even flags dead ends, like one dewarping repo annotated “No code :(”.

Key highlights

  • Covers niche problems most developers forget: document dewarping, medieval manuscript layout analysis, form segmentation
  • Handwritten recognition section includes both deep learning models and cloud-service wrappers like Handprint
  • Table detection ranges from research code (TableTransformer) to practical extraction tools (Camelot, ExtractTable-py)
  • Many entries include paper links and year, making it useful for tracing which techniques are current vs. stale
  • 1,004 stars suggest the community lacks a better centralized index

Caveats

  • No code in this repo itself; it is purely a curated list with minimal organization beyond headings
  • Quality of listed projects varies wildly, and there is no evaluation or comparison between alternatives
  • Some sections are just bullet links with no description; others have helpful one-line summaries

Verdict

Worth bookmarking if you are building a document-processing pipeline and need to survey options before committing to a stack. Skip it if you want a single turnkey library—this is a map, not a vehicle.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.