← all repositories

pytorch/ao

PyTorch-native library providing quantization and sparsity techniques to optimize LLMs (Llama, Gemma, DeepSeek) for faster training and inference.

ao
Velocity · 7d
+3.0
★ / day
Trend
steady
star history

TorchAO is the official PyTorch library for model optimization through quantization and sparsity. It provides techniques including float8 training for 1.5x pre-training speedup, quantization-aware training (QAT) to recover accuracy lost in post-training quantization, and int4 weight-only quantization achieving 1.89x inference speedup with 58% memory reduction. The library supports transformer models like Llama and Gemma and includes optimizations for mixed-expert (MoE) architectures using MXFP8 precision.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.