← all repositories

deepseek-ai/DeepSeek-MoE

A Mixture-of-Experts language model architecture achieving expert specialization in large language models.

1.9k stars Python Language Models
DeepSeek-MoE
Velocity · 7d
+2.2
★ / day
Trend
steady
star history

DeepSeek-MoE implements a novel Mixture-of-Experts architecture for language models, focusing on achieving ultimate expert specialization. The architecture uses sparse activation to efficiently route tokens through specialized expert networks. It provides model weights, training code, and evaluation results as a foundation model research project.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.