deepseek-ai/DeepSeek-V2
DeepSeek-V2 is a Mixture-of-Experts large language model with sparse activation, developed by DeepSeek AI.
★5k stars Language Models

Velocity · 7d
+6.4
★ / day
Trend
→steady
star history
The repository contains weights, training code, and configuration for DeepSeek-V2, a 236B parameter MoE model. It employs Multi-head Latent Attention (MLA) and DeepSeekMoE architecture with 16 experts per token for efficient sparse computation. The model is designed to be economical and efficient compared to dense models of equivalent capacity.