vacancy/Synchronized-BatchNorm-PyTorch
A PyTorch module implementing synchronized batch normalization that computes statistics across all devices during distributed training.

This repository provides a synchronized batch normalization implementation that differs from PyTorch’s built-in BatchNorm by reducing mean and standard deviation across all devices rather than per-device. This ensures accurate batch statistics when training with multiple GPUs using DataParallel. The implementation is particularly important for tasks with small per-GPU batch sizes, such as object detection, where standard batch normalization can degrade performance. For single-GPU or CPU-only scenarios, it behaves identically to PyTorch’s built-in implementation.