A benchmark that makes relation extraction starve for data on purpose
FewRel forces NLP models to learn entity relationships from a handful of examples, then tests whether they actually generalize.

What it does
FewRel is a dataset and benchmark for few-shot relation extraction: given five (or ten) relation types and just one to five examples of each, your model must figure out which relation holds between two entities in a sentence. It ships with 100+ relations, tens of thousands of annotated instances, and baseline implementations including Prototypical Networks and a BERT-based PAIR model.
The interesting bit
The project deliberately withholds the test set—you submit your model to their leaderboard for evaluation, which keeps the benchmark honest. FewRel 2.0 then piles on two extra headaches: domain adaptation (Wikipedia → PubMed) and “none-of-the-above” detection, where some query instances match none of the provided relations.
Key highlights
- Two benchmark tracks: FewRel 1.0 (standard few-shot) and FewRel 2.0 (adds domain adaptation + NOTA detection)
- Baseline models include Proto-CNN and BERT-PAIR, with reproduction commands and reported numbers in the README
- Supports configurable N-way K-shot settings, multiple encoders (CNN, BERT), and adversarial training for domain shift
- Hidden test sets with public leaderboards; validation sets available for local tuning
- Pre-trained embeddings and BERT checkpoint downloadable via provided script
Caveats
- Test data is intentionally withheld, so you cannot run full offline evaluation without submitting to their website
- The repo contains data but not pre-trained files (GloVe, BERT); you need to run
download_pretrain.sh --fp16requires NVIDIA Apex, which is an extra dependency not handled by standard pip
Verdict
Researchers working on few-shot learning for NLP or relation extraction specifically should grab this. If you need an off-the-shelf relation extractor with abundant training data, or you refuse to submit to external leaderboards, look elsewhere.