ModelCloud/GPTQModel
A quantization toolkit for compressing large language models with hardware acceleration across NVIDIA, AMD, Intel GPUs and CPUs.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
GPTQModel provides LLM model quantization and compression capabilities, enabling efficient deployment of large language models on various hardware platforms. It supports quantization through HuggingFace, vLLM, and SGLang inference engines. The toolkit targets NVIDIA CUDA, AMD ROCm, Huawei Ascend NPU, Intel XPU, and Intel/AMD/Apple CPUs, facilitating model compression for production deployment.