Is yomitoku open source?

Yes — kotaro-kinoshita/yomitoku is an open-source project tracked on heatdrop.

What language is yomitoku written in?

kotaro-kinoshita/yomitoku is primarily written in Python.

How popular is yomitoku?

kotaro-kinoshita/yomitoku has 1.6k stars on GitHub.

Where can I find yomitoku?

kotaro-kinoshita/yomitoku is on GitHub at https://github.com/kotaro-kinoshita/yomitoku.

← all repositories

kotaro-kinoshita/yomitoku

Japanese OCR that actually reads the room (and the ruby text)

A document-AI toolkit built for the specific miseries of Japanese layout: vertical text, furigana, tables, and handwriting.

★1.6k stars Python Computer Vision Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

YomiToku is a Python document-analysis engine trained specifically on Japanese documents. It runs OCR, layout analysis, table-structure recognition, and reading-order estimation, then exports to HTML, Markdown, JSON, CSV, or searchable PDF. It also extracts figures and can perform schema-based structured-data extraction via either rule-based matching or an optional LLM backend.

The interesting bit

Most OCR tools treat Japanese as an afterthought. YomiToku was trained on Japanese datasets and claims support for over 7,000 characters, vertical writing, handwritten text, and furigana (ruby text) — with a dedicated flag to strip those ruby annotations out. The reading-order estimation is the quiet killer feature: it tries to preserve semantic layout instead of spewing left-to-right garbage across complex magazine or report pages.

Key highlights

Four custom-trained PyTorch models (text detection, text recognition, layout analysis, table structure) running on ≤8 GB VRAM, with a CPU-optimized lightweight variant available
Structured-data extractor works two ways: fast rule-based extraction for fixed forms, or LLM-powered extraction for irregular documents like receipts and business cards
Handles multi-page PDFs, can ignore headers/footers, and offers fine-grained control over line-break handling and character encoding (including Shift_JIS and CP932)
Explicitly not optimized for scene text (signs, billboards); built for scanned documents and PDFs

Caveats

CC BY-NC-SA 4.0 license: non-commercial use only unless you buy a commercial license via AWS Marketplace or direct sales
The lightweight model caps line length at 50 characters, making it unsuitable for dense English documents
GPU strongly recommended for the standard models; CPU inference is described as slow

Verdict

Worth a look if you’re drowning in Japanese paperwork, government reports, or mixed-layout forms. Skip it if you need street-sign OCR or if the non-commercial licensing is a hard blocker — the commercial version exists, but it’s a separate product.

Frequently asked

What is kotaro-kinoshita/yomitoku?: A document-AI toolkit built for the specific miseries of Japanese layout: vertical text, furigana, tables, and handwriting.
Is yomitoku open source?: Yes — kotaro-kinoshita/yomitoku is an open-source project tracked on heatdrop.
What language is yomitoku written in?: kotaro-kinoshita/yomitoku is primarily written in Python.
How popular is yomitoku?: kotaro-kinoshita/yomitoku has 1.6k stars on GitHub.
Where can I find yomitoku?: kotaro-kinoshita/yomitoku is on GitHub at https://github.com/kotaro-kinoshita/yomitoku.