lucidrains/local-attention
A PyTorch implementation of local windowed attention mechanisms for efficient transformer-based language modeling.

This repository provides a PyTorch implementation of local windowed attention, a foundational transformer component that restricts attention computation to fixed-size windows for efficient language modeling. It supports causal masking, relative positional encoding, and shared query/key space for Reformer-style architectures. The code is designed as a reusable building block for training transformer-based language models, with a focus on providing an incredibly strong baseline through local attention in bottom transformer layers.