microsoft/Tutel
Microsoft's optimized Mixture-of-Experts implementation library for LLM inference and training with advanced quantization support.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
Tutel is a performance-optimized library for Mixture-of-Experts architectures in large language models. It provides parallel solutions including No-penalty Parallellism for models with dynamic behaviors during training and inference. The library supports FP8, NVFP4, MXFP4, and BlockwiseFP8 quantization formats across CUDA and ROCm GPUs for serving MoE-based models including DeepSeek, Kimi, Qwen3, GLM-5, and GptOSS.