pytorch/ao
PyTorch-native library providing quantization and sparsity techniques to optimize LLMs (Llama, Gemma, DeepSeek) for faster training and inference.

TorchAO is the official PyTorch library for model optimization through quantization and sparsity. It provides techniques including float8 training for 1.5x pre-training speedup, quantization-aware training (QAT) to recover accuracy lost in post-training quantization, and int4 weight-only quantization achieving 1.89x inference speedup with 58% memory reduction. The library supports transformer models like Llama and Gemma and includes optimizations for mixed-expert (MoE) architectures using MXFP8 precision.