Yes — pytorch/ao is an open-source project tracked on heatdrop.

What language is ao written in?

pytorch/ao is primarily written in Python.

pytorch/ao has 2.9k stars on GitHub.

pytorch/ao is on GitHub at https://github.com/pytorch/ao.

pytorch/ao

PyTorch’s built-in shrink ray for neural networks

TorchAO gives PyTorch models native quantization and sparsity so they train faster and fit on everything from H100 clusters to iPhones.

★2.9k stars Python Inference · Serving ML Frameworks LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

TorchAO is PyTorch’s toolkit for compressing models through quantization and sparsity across the full training-to-serving lifecycle. It offers float8 and MXFP8 recipes for pre-training, int4 weight-only compression for inference, and Quantization-Aware Training to recover accuracy lost by post-training quantization. The library is built to stay inside standard PyTorch workflows, cooperating with torch.compile, FSDP2, and most HuggingFace architectures.

The interesting bit

Because it is native PyTorch, a single quantize_ call can prepare a model for QAT fine-tuning with LoRA, export it through ExecuTorch to an iPhone 15 Pro, or serve it via vLLM and HuggingFace Transformers without swapping abstraction layers.

Key highlights

Float8 training reportedly hits 1.5× speedup on Llama-3.1-70B pre-training, while int4 quantization delivers 1.89× faster inference and 58% less memory on Llama-3-8B.
QAT recipes recover 96% of accuracy degradation on hellaswag and 68% of perplexity degradation on wikitext for Llama3 versus vanilla post-training quantization.
Integrations cover HuggingFace Transformers, vLLM, Diffusers, Unsloth, Axolotl, and ExecuTorch, letting the same compressed model move from training cluster to edge device.
Supports semi-structured 2:4 sparsity and block sparsity, with a prototype MXFP8 MoE training path claiming ~1.45× speedup on Llama4 Scout MoE layers.

Caveats

The README explicitly labels some workflows as prototype (e.g., MXFP8 MoE training), so check the workflow docs before committing a production pipeline to the newest paths.
Certain acceleration paths rely on an optional external kernel dependency, MSLK, meaning out-of-the-box performance may vary by install.
A published compatibility table for dependency versions suggests the usual PyTorch-ecosystem version-matching headaches apply.

Verdict

If you are already inside the PyTorch/HuggingFace stack and want to trade a little precision for speed or memory, TorchAO is the logical starting point. If you live in JAX, TensorRT, or fully custom C++ serving stacks, there is little here for you.

Frequently asked

What is pytorch/ao?: TorchAO gives PyTorch models native quantization and sparsity so they train faster and fit on everything from H100 clusters to iPhones.
Is ao open source?: Yes — pytorch/ao is an open-source project tracked on heatdrop.
What language is ao written in?: pytorch/ao is primarily written in Python.
How popular is ao?: pytorch/ao has 2.9k stars on GitHub.
Where can I find ao?: pytorch/ao is on GitHub at https://github.com/pytorch/ao.