tczhangzhi/pytorch-distributed
A quickstart and benchmark collection for distributed PyTorch training using DataParallel, Distributed, Apex, and Horovod.

This repository provides code examples and benchmarks for distributed training in PyTorch across single-machine multi-GPU scenarios. It covers implementations using nn.DataParallel, torch.distributed, torch.multiprocessing, NVIDIA Apex, and Horovod, along with SLURM cluster configurations. The benchmarks compare training speed on ImageNet with Tesla V100 GPUs, finding Apex and Horovod achieve similar performance while DataParallel is notably slower.