A 2017 hate-speech dataset that sparked a field — and a reckoning
The ICWSM paper and Python 2.7 code that showed how easily "offensive" and "hate speech" get conflated, with follow-up work finding racial bias in the labels themselves.

What it does
This repository holds the original dataset, lexicon, and Jupyter notebooks from a 2017 ICWSM paper on automated hate speech detection. The authors labeled ~25K tweets into “hate speech,” “offensive language,” and “neither,” then built classifiers to separate the three. Everything is provided as Python 2.7 pickles and notebooks, plus a standalone classifier script for new data.
The interesting bit
The paper’s core argument — that “offensive” and “hate speech” are routinely muddled by annotators and models alike — turned out to be prescient. The authors later published follow-up work (2019) finding racial bias embedded in this very dataset, making the repo a case study in how early NLP benchmark datasets can inherit and amplify the prejudices of their annotators.
Key highlights
- ~25K manually labeled tweets with three-way classification (hate speech / offensive / neither)
- Custom lexicon generated to improve hate speech detection accuracy
- Pre-built classifier pipeline with test case for running on new data
- CSV and Python 2.7 pickle formats provided
- Explicit content warnings throughout; authors track usage via a contact form
Caveats
- Repository is no longer maintained; author explicitly rejects issues and pull requests about Python/package compatibility
- Code is Python 2.7, now two major versions behind
- The 2019 follow-up paper identified racial bias in this dataset; the README links to it but does not integrate those findings into the original materials
Verdict
Worth studying if you work on content moderation, dataset ethics, or the history of NLP bias research — less useful if you need production-ready tooling. Treat it as a time-capsule paper artifact, not a dependency.