Vahe1994/AQLM
A PyTorch library for extreme compression of LLMs via additive quantization, achieving ~1-bit per parameter storage with specialized inference kernels.

AQLM implements Additive Quantization for Large Language Models, enabling extreme model compression (e.g., ~1-bit per parameter). It provides CUDA kernels for efficient inference with quantized models, integrates with vLLM for accelerated serving, and includes PV-tuning for fine-tuning compressed models. The library supports arbitrary 8-dimensional codebooks on GPU and offers Colab tutorials for running pre-quantized models.