← all repositories

ModelCloud/GPTQModel

A quantization toolkit for compressing large language models with hardware acceleration across NVIDIA, AMD, Intel GPUs and CPUs.

GPTQModel
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

GPTQModel provides LLM model quantization and compression capabilities, enabling efficient deployment of large language models on various hardware platforms. It supports quantization through HuggingFace, vLLM, and SGLang inference engines. The toolkit targets NVIDIA CUDA, AMD ROCm, Huawei Ascend NPU, Intel XPU, and Intel/AMD/Apple CPUs, facilitating model compression for production deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.