lucidrains/reformer-pytorch
A PyTorch implementation of the Reformer efficient Transformer model with LSH attention, reversible layers, and chunking for memory-efficient training.

This repository provides a PyTorch implementation of the Reformer architecture, an efficient Transformer variant designed to reduce memory consumption during training and inference. It includes key innovations from the original paper: locality-sensitive hashing (LSH) attention to handle long sequences, reversible residual networks for memory-efficient backpropagation, and chunking techniques. The implementation supports auto-regressive language modeling tasks and has been validated on the enwik8 dataset with sequences up to 81k tokens using half precision.