← all repositories

lucidrains/memorizing-transformers-pytorch

A PyTorch implementation of Memorizing Transformers, a transformer architecture augmented with approximate nearest neighbor memory retrieval.

644 stars Python Language ModelsML Frameworks
memorizing-transformers-pytorch
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides a PyTorch implementation of the Memorizing Transformers paper from ICLR 2022. The model augments standard transformer attention with an external memory system that uses approximate nearest neighbors for retrieval. During inference, the model retrieves relevant past tokens from a memory store to enhance context understanding. The implementation uses cosine similarity attention with learned temperature for the KNN attention layer, and supports hybrid attention across local and distant contexts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.