← all repositories

bytedance/flux

A high-performance GPU kernel library enabling computation-communication overlap for distributed model training and inference.

flux
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

Flux provides optimized kernels for tensor and expert parallelism, targeting efficient training and inference of dense and MoE (Mixture of Experts) models on GPUs. It integrates with PyTorch and focuses on overlapping communication with computation to maximize GPU utilization during distributed training. The library is designed to support various parallelism strategies used in large-scale LLM training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.