google-research/bigbird
Sparse-attention transformer model extending BERT to process much longer sequences for NLP tasks.

Velocity · 7d
+0.3
★ / day
Trend
→steady
star history
BigBird is a transformer architecture with sparse attention mechanisms designed to handle sequences significantly longer than standard BERT. It provides theoretical guarantees about what tasks the sparse approximation can capture while drastically improving performance on NLP tasks like question answering and summarization. The codebase includes core attention implementations, encoder stacks, and packaged BERT and seq2seq models with BigBird attention.