lucidrains/recurrent-memory-transformer-pytorch
A PyTorch implementation of the Recurrent Memory Transformer architecture for processing long sequences using memory tokens.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository implements the Recurrent Memory Transformer (RMT) paper in PyTorch, introducing memory tokens that allow transformers to handle very long contexts by compressing information across segments. The architecture passes memory embeddings between segments, enabling information retention across sequences of arbitrary length. It includes support for flash attention and is designed for autoregressive sequence modeling tasks.