Is GPTQModel open source?

Yes — ModelCloud/GPTQModel is an open-source project tracked on heatdrop.

What language is GPTQModel written in?

ModelCloud/GPTQModel is primarily written in Python.

How popular is GPTQModel?

ModelCloud/GPTQModel has 1.2k stars on GitHub.

Where can I find GPTQModel?

ModelCloud/GPTQModel is on GitHub at https://github.com/ModelCloud/GPTQModel.

← all repositories

ModelCloud/GPTQModel

The Kitchen-Sink Approach to Shrinking LLMs

Because no one wants to maintain five different quantization pipelines for five different chip vendors.

★1.2k stars Python Inference · Serving LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does GPTQModel is a Python toolkit that compresses large language models so they run on cheaper, smaller, or non-NVIDIA hardware. It bundles multiple quantization algorithms—GPTQ, AWQ, ParoQuant, GGUF, FP8, EXL3, and others—into a single interface, then dispatches inference through Hugging Face Transformers, vLLM, or SGLang. The project targets NVIDIA CUDA, AMD ROCm, Intel XPU, Huawei Ascend NPU, and Intel/AMD/Apple CPUs with dedicated kernels for each.

The interesting bit Rather than shipping a bloated wheel of pre-compiled CUDA binaries, the toolkit JIT-compiles kernels on demand, reportedly shrinking the install size by about 300×. It also treats Mixture-of-Experts models as a first-class problem, offering data-parallel multi-GPU quantization, disk offloading to curb CPU RAM spikes, and a “FailSafe” mode that smooths out uneven expert routing during calibration.

Key highlights

JIT-compiled CUDA kernels; Marlin inference support for NVIDIA Turing and newer
Native torch kernels for Huawei Ascend NPU (added in v7.0.0)
Hardware-optimized CPU paths for Intel AMX, AVX2, and AVX512
Multi-GPU data-parallel quantization with Python 3.13 free-threading (nogil)
Automatic disk offloading and “FailSafe” smoothing for uneven MoE expert routing

Verdict Worth a look if you are shipping quantized models to heterogeneous hardware or need MoE support beyond the usual single-GPTQ-path scripts. Skip it if you are happily locked into one vendor stack and a single quantization format.

Frequently asked

What is ModelCloud/GPTQModel?: Because no one wants to maintain five different quantization pipelines for five different chip vendors.
Is GPTQModel open source?: Yes — ModelCloud/GPTQModel is an open-source project tracked on heatdrop.
What language is GPTQModel written in?: ModelCloud/GPTQModel is primarily written in Python.
How popular is GPTQModel?: ModelCloud/GPTQModel has 1.2k stars on GitHub.
Where can I find GPTQModel?: ModelCloud/GPTQModel is on GitHub at https://github.com/ModelCloud/GPTQModel.