← all repositories

microsoft/Tutel

Microsoft's optimized Mixture-of-Experts implementation library for LLM inference and training with advanced quantization support.

Tutel
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Tutel is a performance-optimized library for Mixture-of-Experts architectures in large language models. It provides parallel solutions including No-penalty Parallellism for models with dynamic behaviors during training and inference. The library supports FP8, NVFP4, MXFP4, and BlockwiseFP8 quantization formats across CUDA and ROCm GPUs for serving MoE-based models including DeepSeek, Kimi, Qwen3, GLM-5, and GptOSS.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.