← all repositories

pjlab-sys4nlp/llama-moe

A project that constructs sparse Mixture-of-Experts models by partitioning LLaMA's feed-forward networks into experts with top-K routing gates.

1k stars Python Language ModelsML Frameworks
llama-moe
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

LLaMA-MoE builds sparse Mixture-of-Experts models from LLaMA by partitioning feed-forward networks into experts and inserting top-K routing gates at each layer. The initialized MoE models are then continually pre-trained on optimized data sampling from SlimPajama and filtered datasets. This approach achieves reduced activated parameter counts (3.0-3.5B) compared to dense LLaMA models while maintaining language modeling capabilities.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.