unitaryai/detoxify
BERT-based toxicity classifier that predicts toxic comments across three Jigsaw challenges.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
Detoxify provides trained models and code for multi-label toxicity classification of text comments. It uses transformer-based architectures (BERT, ALBERT) with PyTorch Lightning for training and supports three model variants: original, unbiased, and multilingual. The models classify comments into toxicity categories such as toxic, severe_toxic, obscene, threat, insult, and identity_attack, with AUC scores reaching 93%+ on benchmark tests.