← all repositories
mateogianolio/ocr

OCR from scratch: a neural net that learns to read, in Node.js

A vanilla JavaScript MLP that trains itself on generated captchas or MNIST digits, then exports a standalone predictor you can require like any module.

1.1k stars JavaScript Computer VisionML Frameworks
ocr
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This repo trains a plain multi-layer perceptron to recognize characters from images, then serializes the trained network to ./ocr.js — a self-contained module you require() and feed binary pixel arrays. It works with either the classic MNIST handwritten digits or synthetic training data generated from a hacked-up captcha engine.

The interesting bit

The training pipeline is the clever part: it renders glyphs to a canvas, thresholds the pixels into a flat binary array, and trains the MLP without touching TensorFlow or PyTorch. The README includes hard numbers — 95.16% on MNIST in ~22 minutes on a 2015 MacBook Pro — which gives you a honest baseline for how far a simple neural net can get.

Key highlights

  • Exports a standalone ocr.js predictor after training; no runtime dependencies on the training stack
  • Supports MNIST or auto-generated synthetic datasets from configurable fonts and character sets
  • Flattened 1D binary input keeps the implementation simple (no convolutions, no GPU)
  • Documented benchmarks for digits, lowercase, and mixed alphanumeric sets
  • Configurable via a single config.json — hidden layer size, learning rate, image dimensions, threshold

Caveats

  • Requires Cairo system dependencies for the canvas package, which can be a pain to install
  • The alphanumeric benchmark omits the digit 4 from its training text (likely a typo: 012356789)
  • No mention of model versioning, incremental training, or handling real-world noise beyond the synthetic generator

Verdict

Worth a look if you want to understand MLPs without framework magic, or need a tiny, embeddable OCR predictor. Skip it if you need production-grade accuracy on messy scans — this is a teaching tool and weekend-project baseline, not a replacement for Tesseract or modern CNNs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.