BERT learns relations by reading between the blanks
A PyTorch reimplementation of ACL 2019's "Matching the Blanks" paper, with ALBERT and BioBERT variants thrown in for good measure.

What it does
Trains BERT-family models to identify semantic relations between entity pairs—Cause-Effect, Component-Whole, and so on—using a clever pre-training trick. You feed it text with entities marked [E1]…[E1] and [E2]…[E2], and it predicts the relationship. Supports BERT, ALBERT, and BioBERT, with an optional “Matching the Blanks” pre-training phase on unlabeled text before fine-tuning on standard benchmarks like SemEval2010 Task 8.
The interesting bit
The “Matching the Blanks” pre-training doesn’t need labeled relations at all—it mines entity pairs from raw text using spaCy’s NER and dependency parsing, then trains the model to recognize when two entities share a similar relational context. The paper’s insight: distributional similarity of entity pairs is a free supervision signal. The repo also auto-detects entities at inference time, brute-forcing all possible pairs if you’re too lazy to tag them yourself.
Key highlights
- Implements BERT, ALBERT, and BioBERT variants for relation extraction
- Optional MTB pre-training on any continuous text (CNN dataset provided, though paper used larger wiki dumps)
- Inference with manual [E1]/[E2] tags or automatic entity detection via spaCy
- FewRel 1.0 support with 5-way 1-shot results (BERT-large hits 72.8% zero-shot)
- F1 benchmark graphs for SemEval2010 Task 8 included
Caveats
- Not the official paper repo; author is upfront about this
- MTB pre-training results on SemEval are listed as “To add”—section is incomplete
- Pre-training is noted as “can take a long time” and the provided CNN data is smaller than what the paper actually used
Verdict
Worth a look if you’re doing biomedical NLP (BioBERT) or need a working baseline for relation extraction with modern transformers. Skip if you need the exact reproduced numbers from the paper—several benchmark rows are still pending.