← all repositories

OpenGVLab/OmniQuant

LLM quantization framework that compresses model weights (W4/W3/W2) and activations for efficient inference on GPUs and mobile devices.

OmniQuant
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

OmniQuant provides quantization algorithms for large language models, supporting weight-only quantization (W4A16/W3A16/W2A16) and weight-activation quantization (W6A6/W4A4). The repository includes a model zoo with pre-quantized checkpoints for LLaMA, LLaMA-2-Chat, OPT, Falcon, and Mixtral-7Bx8. It also integrates with MLC-LLM to deploy quantized models on GPUs and mobile hardware, enabling efficient inference for resource-constrained environments.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.