OpenAI's own AI detection homework, published
A release of GPT-2 outputs designed to make the model detectable—part research dataset, part admission that this problem needs outside help.

What it does
This repository distributes millions of GPT-2-generated text samples alongside the real WebText articles they were trained on. It includes outputs from every model size (117M to 1.5B parameters), generated both randomly and with Top-K 40 truncation, plus a finetuned variant that spits out Amazon reviews. OpenAI wants researchers to study detection, biases, and whatever else the data reveals.
The interesting bit
The dataset doubles as a benchmark with baseline detection scores already baked in: mid-90% accuracy for Top-K 40 outputs, but only mid-70s to high-80s for unrestricted random sampling. OpenAI also notes—almost in passing—that finetuning lets adversaries evade detection, which is less a feature and more a warning about the arms race they were already anticipating in 2019.
Key highlights
- 250K real WebText documents plus 250K generated samples per model, per generation strategy
- Train/valid/test splits provided; includes a finetuned Amazon review model for adversarial detection research
- Baseline detection code and analysis included (
baseline.py,detection.md) - Data migrated from Google Cloud to Azure blob storage;
download_dataset.pyscript provided - Direct data removal contact for WebText contributors (
webtextdata@openai.com)
Caveats
- The README is sparse on methodology details—how exactly the “initial analysis” was conducted is left to the linked files
- No explicit license mentioned in the provided source
- Storage URLs have changed once already; links may rot
Verdict
Researchers building or evaluating AI text detectors should start here—it’s the ground truth for a foundational model. Everyone else can skip; this is raw data and a few scripts, not a tool you run out of the box.