tysam-code/hlb-CIFAR10
A PyTorch implementation achieving near-world-record single-GPU CIFAR-10 training speed using optimized CNN architectures.

This repository implements a highly optimized convolutional neural network for CIFAR-10 image classification, originally inspired by David Page’s work but rewritten for rapid experimentation. It focuses on maximizing training speed on a single GPU through architectural tweaks, hyperparameter tuning, memory format optimizations, and Dirac initialization. The project holds or held the world record for single-GPU CIFAR-10 training time on an A100, demonstrating state-of-the-art training throughput for computer vision models.