← all repositories

lucidrains/mixture-of-experts

A PyTorch library implementing Sparsely-Gated Mixture of Experts to increase language model parameters without increasing computation.

859 stars Python ML FrameworksLanguage Models
mixture-of-experts
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides a PyTorch implementation of the Sparsely-Gated Mixture of Experts architecture, originally from Google. It allows massive increases in model parameter count by using a gating mechanism that activates only a subset of expert networks per input token. The library includes configurable capacity factors, auxiliary expert-balancing losses, and policies for top-k expert selection during training and evaluation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.