Xilinx's side project for shrinking neural nets
A PyTorch library that lets you quantize layers individually without rewriting your model from scratch.

What it does
Brevitas provides quantized drop-in replacements for standard PyTorch layers—QuantConv2d, QuantLSTM, QuantMultiheadAttention, and others—so you can apply post-training quantization (PTQ) or quantization-aware training (QAT) without abandoning familiar APIs. Each tensor (weights, inputs, bias, outputs) gets its own tunable quantization settings.
The interesting bit The granularity is the selling point: you can tune bit-width and scale per-layer rather than accepting one-size-fits-all quantization. There’s also a reference PTQ pipeline for ImageNet models if you want to see how a torchvision model behaves at 4-bit versus 8-bit.
Key highlights
- Drop-in quantized variants of common
torch.nnlayers underbrevitas.nn - Independent quantization config for weights, activations, bias, and outputs
- Supports both PTQ and QAT workflows
- Reference example for ImageNet classification PTQ included
- Available on PyPI; supports Python 3.9–3.12 and PyTorch 1.12–2.8
Caveats
- Explicitly labeled a research project, not an official Xilinx product
- PyTorch versions beyond 2.8 are untested, so bleeding-edge installs may break
Verdict Worth a look if you’re shipping models to FPGA or edge hardware and need fine-grained control over quantization tradeoffs. Skip it if you want a polished, vendor-supported toolchain with guaranteed upstream compatibility.