← all repositories

davidmrau/mixture-of-experts

PyTorch implementation of the sparsely-gated Mixture-of-Experts layer for scaling neural networks.

1.2k stars Python ML Frameworks
mixture-of-experts
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

This repository provides a PyTorch re-implementation of the sparsely-gated MoE layer described in the influential paper ‘Outrageously Large Neural Networks’. The implementation enables sparse activation of expert subnetworks within a larger model, allowing efficient scaling to billions of parameters. It was used as a reference implementation by FastMoE and includes working examples with CIFAR-10 and dummy data for training and evaluation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.