The paper that launched a thousand fine-tuners
Google's original BERT implementation and pre-trained models, still the reference point for transformer-based NLP.

What it does
This is the official TensorFlow repository for BERT — the bidirectional transformer pre-training approach that dominated NLP benchmarks in 2018. It contains the original implementation plus a zoo of pre-trained models: Base, Large, multilingual variants, Chinese, and later additions like Whole Word Masking and 24 smaller “BERT-Tiny” through “BERT-Medium” models for resource-constrained environments.
The interesting bit
The README is essentially a changelog of model releases, which tells you something about how research infrastructure ages. The 2020 release of smaller models is the most intellectually notable addition — Google explicitly wants to enable research at institutions that can’t afford BERT-Large’s compute appetite, and suggests these compact models work best when distilled from a larger teacher.
Key highlights
- Pre-trained checkpoints for English (cased/uncased), 104-language multilingual, and Chinese
- Whole Word Masking variants that mask complete words instead of WordPiece fragments, improving SQuAD F1 by ~1.8 points
- TensorFlow Hub integration with a Colab notebook for quick experimentation
- Third-party PyTorch and Chainer ports acknowledged (HuggingFace, et al.) — though Google notes they didn’t build or maintain those
- GLUE benchmark scores published for the smaller model matrix, with hyperparameter search details
Caveats
- The repository is frozen in 2020 TensorFlow patterns; modern practitioners likely want HuggingFace
transformersinstead - The smaller models’ GLUE scores show BERT-Tiny scoring 0.0 on CoLA — not a typo, just a very limited model struggling with linguistic acceptability
- No code changes needed for most model swaps, but you’ll need to track which tokenization scheme your chosen model expects
Verdict
Historians of NLP and researchers reproducing 2018-2020 papers still need this. Everyone else has already migrated to more ergonomic wrappers. Worth bookmarking for the pre-trained model URLs alone.