← all repositories

lucidrains/performer-pytorch

A PyTorch implementation of Performer, a transformer variant using linear attention via FAVOR+ random feature approximation.

1.2k stars Python Language ModelsML Frameworks
performer-pytorch
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

This repository provides a PyTorch implementation of the Performer architecture, a linear attention-based transformer that uses Fast Attention Via positive Orthogonal Random features (FAVOR+) to approximate softmax attention efficiently. It supports both causal (autoregressive) and non-causal attention modes, reversible layers, and various optimization techniques like feedforward chunking and scale norm. The library offers a ready-to-use PerformerLM class configurable for sequence length, dimensions, depth, and attention parameters.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.