deepseek-ai/DeepSeek-MoE
A Mixture-of-Experts language model architecture achieving expert specialization in large language models.

Velocity · 7d
+2.2
★ / day
Trend
→steady
star history
DeepSeek-MoE implements a novel Mixture-of-Experts architecture for language models, focusing on achieving ultimate expert specialization. The architecture uses sparse activation to efficiently route tokens through specialized expert networks. It provides model weights, training code, and evaluation results as a foundation model research project.