Yes — mateogianolio/ocr is open source, released under the MIT license.

What language is ocr written in?

mateogianolio/ocr is primarily written in JavaScript.

mateogianolio/ocr has 1.1k stars on GitHub.

Where can I find ocr?

mateogianolio/ocr is on GitHub at https://github.com/mateogianolio/ocr.

mateogianolio/ocr

OCR from scratch: a neural net that learns to read, in Node.js

A vanilla JavaScript MLP that trains itself on generated captchas or MNIST digits, then exports a standalone predictor you can require like any module.

★1.1k stars JavaScript Computer Vision ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo trains a plain multi-layer perceptron to recognize characters from images, then serializes the trained network to ./ocr.js — a self-contained module you require() and feed binary pixel arrays. It works with either the classic MNIST handwritten digits or synthetic training data generated from a hacked-up captcha engine.

The interesting bit

The training pipeline is the clever part: it renders glyphs to a canvas, thresholds the pixels into a flat binary array, and trains the MLP without touching TensorFlow or PyTorch. The README includes hard numbers — 95.16% on MNIST in ~22 minutes on a 2015 MacBook Pro — which gives you a honest baseline for how far a simple neural net can get.

Key highlights

Exports a standalone ocr.js predictor after training; no runtime dependencies on the training stack
Supports MNIST or auto-generated synthetic datasets from configurable fonts and character sets
Flattened 1D binary input keeps the implementation simple (no convolutions, no GPU)
Documented benchmarks for digits, lowercase, and mixed alphanumeric sets
Configurable via a single config.json — hidden layer size, learning rate, image dimensions, threshold

Caveats

Requires Cairo system dependencies for the canvas package, which can be a pain to install
The alphanumeric benchmark omits the digit 4 from its training text (likely a typo: 012356789)
No mention of model versioning, incremental training, or handling real-world noise beyond the synthetic generator

Verdict

Worth a look if you want to understand MLPs without framework magic, or need a tiny, embeddable OCR predictor. Skip it if you need production-grade accuracy on messy scans — this is a teaching tool and weekend-project baseline, not a replacement for Tesseract or modern CNNs.

Frequently asked

What is mateogianolio/ocr?: A vanilla JavaScript MLP that trains itself on generated captchas or MNIST digits, then exports a standalone predictor you can require like any module.
Is ocr open source?: Yes — mateogianolio/ocr is open source, released under the MIT license.
What language is ocr written in?: mateogianolio/ocr is primarily written in JavaScript.
How popular is ocr?: mateogianolio/ocr has 1.1k stars on GitHub.
Where can I find ocr?: mateogianolio/ocr is on GitHub at https://github.com/mateogianolio/ocr.