← all repositories

Vahe1994/AQLM

A PyTorch library for extreme compression of LLMs via additive quantization, achieving ~1-bit per parameter storage with specialized inference kernels.

AQLM
Velocity · 7d
+1.5
★ / day
Trend
steady
star history

AQLM implements Additive Quantization for Large Language Models, enabling extreme model compression (e.g., ~1-bit per parameter). It provides CUDA kernels for efficient inference with quantized models, integrates with vLLM for accelerated serving, and includes PV-tuning for fine-tuning compressed models. The library supports arbitrary 8-dimensional codebooks on GPU and offers Colab tutorials for running pre-quantized models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.