lucidrains/mixture-of-experts
A PyTorch library implementing Sparsely-Gated Mixture of Experts to increase language model parameters without increasing computation.

This repository provides a PyTorch implementation of the Sparsely-Gated Mixture of Experts architecture, originally from Google. It allows massive increases in model parameter count by using a gating mechanism that activates only a subset of expert networks per input token. The library includes configurable capacity factors, auxiliary expert-balancing losses, and policies for top-k expert selection during training and evaluation.