Is bigbird open source?

Yes — google-research/bigbird is open source, released under the Apache-2.0 license.

What language is bigbird written in?

google-research/bigbird is primarily written in Python.

How popular is bigbird?

google-research/bigbird has 633 stars on GitHub.

Where can I find bigbird?

google-research/bigbird is on GitHub at https://github.com/google-research/bigbird.

← all repositories

google-research/bigbird

Sparse attention that lets BERT read the whole chapter

It swaps BERT’s dense self-attention for a sparse block pattern so transformers can handle thousands of tokens without the usual memory explosion.

★633 stars Python Language Models ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does BigBird is a sparse-attention transformer that retrofits BERT, RoBERTa, and Pegasus-style models to process far longer sequences than standard quadratic attention allows. The core trick is a block-sparse attention mechanism—implemented in core/attention.py—that mixes local window, global tokens, and random blocks. The repository packages this into drop-in encoder and seq2seq replacements, plus pre-trained checkpoints for base and large sizes.

The interesting bit The authors back the sparsity pattern with theoretical claims about what a sparse transformer can still express relative to a full one—rare in a field that usually just throws blocks at the wall and benchmarks them. The Long Range Arena comparison chart included in the repo suggests it keeps pace with other long-range models while cutting memory use.

Key highlights

Three attention modes: original_full, simulated_sparse, and the flagship block_sparse.
Pre-trained BERT-like (bigbr_base, bigbr_large) and Pegasus-like encoder-decoder (bigbp_large) checkpoints available in a public GCS bucket.
Includes fine-tuned tf.SavedModel checkpoints for long-document summarization and a Colab demo for text classification.
The modeling.BertModel class is designed as a near-drop-in replacement for a standard BERT encoder.
Configurable block_size and num_rand_blocks, though the current defaults are three local blocks and two global ones.

Caveats

The codebase is tightly coupled to TPU workflows and only supports statically shaped tensors, so dynamic-length inputs are out.
For sequences under 1,024 tokens, the README itself advises falling back to original_full attention because the sparse machinery offers no benefit.
Hidden dimensions must be divisible by the head count, a constraint inherited from the TPU-centric implementation.

Verdict Researchers or engineers already running BERT/RoBERTa on TPUs and hitting context-length walls should evaluate this. If your sequences are short or your stack is PyTorch-first, this is probably not your bird.

Frequently asked

What is google-research/bigbird?: It swaps BERT’s dense self-attention for a sparse block pattern so transformers can handle thousands of tokens without the usual memory explosion.
Is bigbird open source?: Yes — google-research/bigbird is open source, released under the Apache-2.0 license.
What language is bigbird written in?: google-research/bigbird is primarily written in Python.
How popular is bigbird?: google-research/bigbird has 633 stars on GitHub.
Where can I find bigbird?: google-research/bigbird is on GitHub at https://github.com/google-research/bigbird.