bytedance/byteps
BytePS is a distributed training framework for deep neural networks that supports TensorFlow, Keras, PyTorch, and MXNet.

BytePS is a high-performance distributed training framework designed to accelerate deep neural network training across multiple GPUs and machines. It provides communication optimizations over TCP and RDMA networks to achieve high scaling efficiency—for example, ~90% scaling efficiency with 256 GPUs on BERT-large training. The framework integrates with major deep learning libraries including TensorFlow, Keras, PyTorch, and MXNet, offering APIs like DistributedDataParallel for PyTorch and supporting gradient compression.