← all repositories
google-research/bert

The paper that launched a thousand fine-tuners

Google's original BERT implementation and pre-trained models, still the reference point for transformer-based NLP.

40k stars Python Language ModelsML Frameworks
bert
Velocity · 7d
+14
★ / day
Trend
steady
star history

What it does

This is the official TensorFlow repository for BERT — the bidirectional transformer pre-training approach that dominated NLP benchmarks in 2018. It contains the original implementation plus a zoo of pre-trained models: Base, Large, multilingual variants, Chinese, and later additions like Whole Word Masking and 24 smaller “BERT-Tiny” through “BERT-Medium” models for resource-constrained environments.

The interesting bit

The README is essentially a changelog of model releases, which tells you something about how research infrastructure ages. The 2020 release of smaller models is the most intellectually notable addition — Google explicitly wants to enable research at institutions that can’t afford BERT-Large’s compute appetite, and suggests these compact models work best when distilled from a larger teacher.

Key highlights

  • Pre-trained checkpoints for English (cased/uncased), 104-language multilingual, and Chinese
  • Whole Word Masking variants that mask complete words instead of WordPiece fragments, improving SQuAD F1 by ~1.8 points
  • TensorFlow Hub integration with a Colab notebook for quick experimentation
  • Third-party PyTorch and Chainer ports acknowledged (HuggingFace, et al.) — though Google notes they didn’t build or maintain those
  • GLUE benchmark scores published for the smaller model matrix, with hyperparameter search details

Caveats

  • The repository is frozen in 2020 TensorFlow patterns; modern practitioners likely want HuggingFace transformers instead
  • The smaller models’ GLUE scores show BERT-Tiny scoring 0.0 on CoLA — not a typo, just a very limited model struggling with linguistic acceptability
  • No code changes needed for most model swaps, but you’ll need to track which tokenization scheme your chosen model expects

Verdict

Historians of NLP and researchers reproducing 2018-2020 papers still need this. Everyone else has already migrated to more ergonomic wrappers. Worth bookmarking for the pre-trained model URLs alone.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.