heatdrop.ai

The hottest AI & LLM repositories on GitHub — measured, ranked, and explained.

← all repositories

naptha/tesseract.js

OCR in the browser without calling a single API

Tesseract.js compiles the venerable Tesseract engine to WebAssembly so you can extract text from images entirely client-side.

★38.1k stars JavaScript Computer Vision

View on GitHub ↗ Homepage ↗

tesseract.js

Velocity · 7d

+9.5

★ / day

Trend

→steady

star history

What it does Tesseract.js wraps a WebAssembly build of the Tesseract OCR engine, exposing it through a simple JavaScript worker API. Feed it an image URL, Blob, or buffer; get back recognized text in one of 100+ languages. It runs in browsers via CDN, ESM, or webpack, and on Node.js servers without touching native dependencies.

The interesting bit The project is deliberately a thin wrapper, not a fork. It does not modify Tesseract’s recognition model, add PDF support, or chase accuracy tweaks. That restraint keeps it maintainable but also means the README openly points users to Scribe.js when they need PDF parsing or model improvements — an unusual and honest bit of scope discipline.

Key highlights

Ships language packs on demand; v5 cut English downloads by 54% and Chinese by 73%
Supports real-time video recognition via worker threads
v6 fixed a long-standing memory leak and reduced runtime memory across the board
Output formats beyond plain text (like hocr, blocks) are now opt-in, not default
Requires Node.js 16+ for v7; API has shifted across major versions, so check migration notes

Caveats

No PDF support; the README is explicit that this is out of scope
Breaking changes are common across major versions: createWorker went async in v4, argument signatures changed in v5, and non-text outputs were disabled by default in v6
Accuracy is whatever upstream Tesseract provides; do not expect model improvements here

Verdict Use this when you need OCR without infrastructure — a static site, an Electron app, a serverless function. Skip it if you need PDF text extraction, production-grade accuracy tuning, or a stable API across upgrades.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.