Jamie-Stirling/RetNet
A minimal pure PyTorch implementation of RetNet, an alternative neural architecture to Transformers for large language models.

This repository provides a PyTorch implementation of the Retentive Network architecture described in the paper ‘Retentive Network: A Successor to Transformer for Large Language Models’. It implements single-scale and multi-scale retention mechanisms across parallel, recurrent, and chunkwise computation paradigms. The codebase includes multi-layer networks with feed-forward layers and layer normalization, plus a causal language model built on the retentive architecture. It prioritizes code correctness and readability over optimization.