lucidrains/performer-pytorch
A PyTorch implementation of Performer, a transformer variant using linear attention via FAVOR+ random feature approximation.

This repository provides a PyTorch implementation of the Performer architecture, a linear attention-based transformer that uses Fast Attention Via positive Orthogonal Random features (FAVOR+) to approximate softmax attention efficiently. It supports both causal (autoregressive) and non-causal attention modes, reversible layers, and various optimization techniques like feedforward chunking and scale norm. The library offers a ready-to-use PerformerLM class configurable for sequence length, dimensions, depth, and attention parameters.