Is detoxify open source?

Yes — unitaryai/detoxify is open source, released under the Apache-2.0 license.

What language is detoxify written in?

unitaryai/detoxify is primarily written in Python.

How popular is detoxify?

unitaryai/detoxify has 1.3k stars on GitHub.

Where can I find detoxify?

unitaryai/detoxify is on GitHub at https://github.com/unitaryai/detoxify.

← all repositories

unitaryai/detoxify

Toxic comment detection, minus the Kaggle ensemble headache

Detoxify wraps pretrained transformers from three Jigsaw toxicity challenges into a single, importable classifier.

★1.3k stars Python Domain Apps Language Models ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Detoxify provides pretrained sentence-classification models for detecting toxic, obscene, threatening, or identity-based hostile comments. It wraps Hugging Face Transformers and PyTorch Lightning around weights trained on the three Jigsaw Toxic Comment Challenges—original, unintended bias, and multilingual—exposing them through a unified Python interface. Feed it a string or a list of strings and it returns probabilities across labels like toxicity, insult, or identity_attack.

The interesting bit

The authors deliberately left the absolute top Kaggle scores—which rely on massive model ensembles—on the table in favor of a simple, single-model API. They also openly document the models’ limitations, warning that profanity can trigger false positives regardless of intent and flagging risks of bias toward minority groups; that kind of ethical candor is still depressingly rare in off-the-shelf NLP tools.

Key highlights

Three model flavors: original (BERT), unbiased (RoBERTa), and multilingual (XLM-RoBERTa), plus lightweight ALBERT variants.
Near-leaderboard performance: the original model scores 98.64% mean AUC, just shy of the Kaggle best of 98.86%.
Multilingual coverage for seven languages—English, French, Spanish, Italian, Portuguese, Turkish, and Russian—with per-language AUC ranging from roughly 89% to 97%.
The unbiased model includes identity labels such as male, jewish, and psychiatric_or_mental_illness to help surface demographic bias.
Loads via torch.hub or direct import without wrangling Kaggle datasets or training pipelines.

Caveats

The multilingual model is explicitly restricted to its seven training languages; performance outside them is unsupported.
Because the models flag profanity regardless of tone, humorous or self-deprecating posts can be misclassified as toxic.
The README states the intended use is research or moderator assistance, not unsupervised automated moderation at scale.

Verdict

A solid starting point for researchers or platform builders who need pretrained toxicity baselines without training BERT from scratch. If you require state-of-the-art ensemble accuracy or nuanced context detection, you will still need to roll your own.

Frequently asked

What is unitaryai/detoxify?: Detoxify wraps pretrained transformers from three Jigsaw toxicity challenges into a single, importable classifier.
Is detoxify open source?: Yes — unitaryai/detoxify is open source, released under the Apache-2.0 license.
What language is detoxify written in?: unitaryai/detoxify is primarily written in Python.
How popular is detoxify?: unitaryai/detoxify has 1.3k stars on GitHub.
Where can I find detoxify?: unitaryai/detoxify is on GitHub at https://github.com/unitaryai/detoxify.