majumderb/rezero
A PyTorch library implementing ReZero, a technique that initializes neural network layers as identity maps to enable fast convergence in deep Transformers.

This repository provides the ReZero-Transformer implementation, a drop-in replacement for PyTorch’s Transformer. ReZero adds a single learned parameter per layer to initialize arbitrary layers as identity maps, facilitating better gradient flow in deep networks. The technique enables training Transformers over a hundred layers and was shown to converge 56% faster on enwiki8 language modeling tasks. It also applies to ResNets and fully connected networks.