Is deep-text-corrector open source?

Yes — atpaino/deep-text-corrector is open source, released under the Apache-2.0 license.

What language is deep-text-corrector written in?

atpaino/deep-text-corrector is primarily written in Python.

How popular is deep-text-corrector?

atpaino/deep-text-corrector has 1.2k stars on GitHub.

Where can I find deep-text-corrector?

atpaino/deep-text-corrector is on GitHub at https://github.com/atpaino/deep-text-corrector.

← all repositories

atpaino/deep-text-corrector

Teaching seq2seq to put 'the' back where it belongs

A TensorFlow grammar corrector that learns by deliberately mangling movie dialog.

★1.2k stars Python Language Models Other AI

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Deep Text Corrector trains sequence-to-sequence models to fix small grammatical errors in short, conversational text—think SMS messages or chat. It starts with clean English samples, randomly strips articles, breaks contractions, and swaps homophones to create synthetic training pairs, then trains an attention-based LSTM to reverse the damage.

The interesting bit

The clever part is the decoding constraint: the model is forbidden from inventing words. It can only reuse tokens from the input or a small “corrective” set (words like “the” or “than” that fixes typically insert). This is enforced with a hard binary mask on the logits, plus a neat OOV trick that assumes rare words appear in the same order in input and output—reasonable when the only “errors” are missing articles, not vocabulary swaps.

Key highlights

Synthetic data generation from the Cornell Movie-Dialogs Corpus (300k+ lines), with perturbation rates loosely based on CoNLL 2014 shared task figures
Biased decoding via logit masking, not used during training to preserve learning signal
Straightforward OOV resolution: assumes input and output OOV sequences match one-to-one
Outperforms an identity-function baseline on accuracy across all sentence lengths; BLEU mixed
Ships as an extension of TensorFlow’s 2016-era seq2seq tutorial code, with an IPython notebook for interactive training

Caveats

Requires TensorFlow >= 0.11, which dates the project to roughly 2016–2017; modern TF compatibility is unclear
Error types are narrowly scoped: missing articles, broken contractions, and a handful of homophone swaps—don’t expect it to fix subject-verb agreement or comma splices
The README notes the Cornell corpus was chosen because it was “the largest collection of conversational written English I could find that was mostly grammatically correct,” which is a telling constraint

Verdict

Worth a look if you’re studying constrained seq2seq decoding or grammar correction as a controlled generation problem. Skip it if you need a production-ready corrector; this is a research demonstration with a narrow error model and dated dependencies.

Frequently asked

What is atpaino/deep-text-corrector?: A TensorFlow grammar corrector that learns by deliberately mangling movie dialog.
Is deep-text-corrector open source?: Yes — atpaino/deep-text-corrector is open source, released under the Apache-2.0 license.
What language is deep-text-corrector written in?: atpaino/deep-text-corrector is primarily written in Python.
How popular is deep-text-corrector?: atpaino/deep-text-corrector has 1.2k stars on GitHub.
Where can I find deep-text-corrector?: atpaino/deep-text-corrector is on GitHub at https://github.com/atpaino/deep-text-corrector.