bytedance/flux
A high-performance GPU kernel library enabling computation-communication overlap for distributed model training and inference.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
Flux provides optimized kernels for tensor and expert parallelism, targeting efficient training and inference of dense and MoE (Mixture of Experts) models on GPUs. It integrates with PyTorch and focuses on overlapping communication with computation to maximize GPU utilization during distributed training. The library is designed to support various parallelism strategies used in large-scale LLM training.