← all repositories

tczhangzhi/pytorch-distributed

A quickstart and benchmark collection for distributed PyTorch training using DataParallel, Distributed, Apex, and Horovod.

1.7k stars Python ML Frameworks
pytorch-distributed
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides code examples and benchmarks for distributed training in PyTorch across single-machine multi-GPU scenarios. It covers implementations using nn.DataParallel, torch.distributed, torch.multiprocessing, NVIDIA Apex, and Horovod, along with SLURM cluster configurations. The benchmarks compare training speed on ImageNet with Tesla V100 GPUs, finding Apex and Horovod achieve similar performance while DataParallel is notably slower.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.