davidmrau/mixture-of-experts
PyTorch implementation of the sparsely-gated Mixture-of-Experts layer for scaling neural networks.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
This repository provides a PyTorch re-implementation of the sparsely-gated MoE layer described in the influential paper ‘Outrageously Large Neural Networks’. The implementation enables sparse activation of expert subnetworks within a larger model, allowing efficient scaling to billions of parameters. It was used as a reference implementation by FastMoE and includes working examples with CIFAR-10 and dummy data for training and evaluation.