Is performer-pytorch open source?

Yes — lucidrains/performer-pytorch is open source, released under the MIT license.

What language is performer-pytorch written in?

lucidrains/performer-pytorch is primarily written in Python.

How popular is performer-pytorch?

lucidrains/performer-pytorch has 1.2k stars on GitHub.

Where can I find performer-pytorch?

lucidrains/performer-pytorch is on GitHub at https://github.com/lucidrains/performer-pytorch.

← all repositories

lucidrains/performer-pytorch

Drop-in linear attention for when quadratic scaling hurts

Implements the Performer architecture so you can trade exact softmax attention for linear-time FAVOR+ approximations, either as a full model or a surgical replacement layer.

★1.2k stars Python Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repository provides a PyTorch implementation of the Performer transformer, which uses the FAVOR+ method to approximate softmax attention with random orthogonal features. The result is attention complexity that scales linearly with sequence length rather than quadratically. It ships as a complete language model, a plain modality-agnostic backbone, an encoder-decoder pair, and standalone SelfAttention and CrossAttention modules you can graft into existing code.

The interesting bit

Instead of forcing you to adopt a whole new model ecosystem, the library exposes FastAttention as a low-level primitive, letting you replace just the attention step in a pre-trained transformer. It also piles in useful extras from contemporaneous papers—reversible layers, rotary position embeddings, GLU feedforward variants, and local attention heads—making it feel like a rolling survey of 2020 transformer improvements.

Key highlights

FAVOR+ linear attention via random orthogonal features, with configurable nb_features and redraw intervals.
Modular exports: PerformerLM, Performer, PerformerEncDec, SelfAttention, CrossAttention, and FastAttention.
Hybrid attention: mix global Performer heads with local windowed attention.
Post-training determinism via fix_projection_matrices_().
Optional add-ons from related papers: rotary embeddings, reversible layers, chunked feedforward, and ReZero/ScaleNorm toggles.

Verdict

Worth a look if you are hitting sequence-length limits and want a battle-tested PyTorch implementation you can either run standalone or dissect for parts. Skip it if your sequences are short enough that standard attention is already fast enough and you don’t need the extra knobs.

Frequently asked

What is lucidrains/performer-pytorch?: Implements the Performer architecture so you can trade exact softmax attention for linear-time FAVOR+ approximations, either as a full model or a surgical replacement layer.
Is performer-pytorch open source?: Yes — lucidrains/performer-pytorch is open source, released under the MIT license.
What language is performer-pytorch written in?: lucidrains/performer-pytorch is primarily written in Python.
How popular is performer-pytorch?: lucidrains/performer-pytorch has 1.2k stars on GitHub.
Where can I find performer-pytorch?: lucidrains/performer-pytorch is on GitHub at https://github.com/lucidrains/performer-pytorch.