A thousand stars for a list of OCR lists
A curated index of 100+ tools for turning images into text, from deskewing scanned books to spotting text in the wild.

What it does
This is an awesome-list repository that catalogs open-source projects across the entire OCR pipeline: deskewing and dewarping scanned documents, segmenting pages into lines/words/regions, detecting tables, recognizing handwritten text, and localizing text in natural scene images. Each entry links to a GitHub repo and often includes the corresponding research paper.
The interesting bit
The list exposes how fragmented OCR really is. There is no single “just use this” library; instead you chain specialists—maybe page_dewarp for curved book pages, LayoutParser for structure, then vedastr or SimpleHTR for actual recognition. The maintainer even flags dead ends, like one dewarping repo annotated “No code :(”.
Key highlights
- Covers niche problems most developers forget: document dewarping, medieval manuscript layout analysis, form segmentation
- Handwritten recognition section includes both deep learning models and cloud-service wrappers like
Handprint - Table detection ranges from research code (
TableTransformer) to practical extraction tools (Camelot,ExtractTable-py) - Many entries include paper links and year, making it useful for tracing which techniques are current vs. stale
- 1,004 stars suggest the community lacks a better centralized index
Caveats
- No code in this repo itself; it is purely a curated list with minimal organization beyond headings
- Quality of listed projects varies wildly, and there is no evaluation or comparison between alternatives
- Some sections are just bullet links with no description; others have helpful one-line summaries
Verdict
Worth bookmarking if you are building a document-processing pipeline and need to survey options before committing to a stack. Skip it if you want a single turnkey library—this is a map, not a vehicle.