← all repositories
bakwc/JamSpell

Spell checking that actually reads the room

A C++ spellchecker that uses surrounding words to pick better corrections, wrapped in bindings for half a dozen languages.

662 stars C++ Data ToolingOther AI
JamSpell
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

JamSpell corrects typos by looking at context, not just the misspelled word in isolation. It ships as a C++ library with SWIG bindings for Python, Java, C#, Ruby, and others, plus a built-in HTTP server. You train a language model on your own text corpus, then call FixFragment() or hit the API.

The interesting bit

The benchmarks are refreshingly honest: tested on Wikipedia/news text and separately on Sherlock Holmes to check for overfitting. It beats Norvig’s classic solution and Hunspell on both accuracy and speed—~5K words/sec versus Hunspell’s 163—while breaking fewer correct words. The “Top 7” metric is a nice touch: even when the first guess is wrong, the right word often appears in the top seven candidates.

Key highlights

  • Context-aware n-gram model, not just edit distance
  • ~4,800–5,500 words/second in benchmarks (vs. ~395 for Norvig, ~163–284 for Hunspell)
  • 79% fix rate on Wikipedia/news test, 72% on Sherlock Holmes out-of-domain test
  • Pre-trained models for en, fr, ru; train your own with a text file and alphabet file
  • HTTP API with GET/POST endpoints for correction and candidate lists
  • SWIG interface for generating bindings to other languages

Caveats

  • The “Pro” version with CatBoost ranking, runtime word addition, and more languages is a separate commercial product at jamspell.com; the open-source version is essentially frozen at 0.0.12
  • Pre-trained models are “simple” ones trained on only 600K sentences; the README itself says you need millions for production quality
  • Windows support and built-in Java/C#/Ruby bindings are Pro-only; open-source users roll their own via SWIG

Verdict

Good fit if you need fast, embeddable spell correction with tolerable accuracy and don’t mind training your own model. Skip it if you want turnkey excellence or modern maintenance—the last release was 0.0.12 and active development seems to have moved to the paid Pro tier.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.