Is AutoGPTQ open source?

Yes — AutoGPTQ/AutoGPTQ is open source, released under the MIT license.

What language is AutoGPTQ written in?

AutoGPTQ/AutoGPTQ is primarily written in Python.

How popular is AutoGPTQ?

AutoGPTQ/AutoGPTQ has 5.1k stars on GitHub.

Where can I find AutoGPTQ?

AutoGPTQ/AutoGPTQ is on GitHub at https://github.com/AutoGPTQ/AutoGPTQ.

← all repositories

AutoGPTQ/AutoGPTQ

The Friendly LLM Quantizer That Outlived Its Maintainers

A Hugging Face-friendly wrapper around GPTQ that squeezes LLMs into 4-bit weights for faster, smaller inference—though the maintainers now suggest you look elsewhere.

★5.1k stars Python Inference · Serving Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does AutoGPTQ implements GPTQ weight-only quantization for large language models, shrinking them to 4-bit weights so they fit on smaller GPUs and often run faster than FP16. It exposes a Hugging Face-compatible API for quantizing, saving, loading, and pushing models to the Hub. The package also bundles downstream evaluation tasks so you can check whether the compressed model still performs adequately on your target task.

The interesting bit The README’s most prominent feature is a giant warning banner that the project is unmaintained and recommends GPTQModel instead. Before that exit, the project had integrated deeply with the Hugging Face ecosystem—Transformers, PEFT, and Optimum—which made 4-bit inference accessible without leaving familiar APIs. It also supported a wide hardware spread: CUDA, ROCm, and even Intel Gaudi 2, though getting there required compiling platform-specific kernels.

Key highlights

Weight-only 4-bit GPTQ quantization with configurable group size and activation ordering.
Inference speedups over FP16 in some configs (e.g., Llama-7B on A100) and the ability to run models that otherwise OOM, like GPT-J-6B on a 12 GB RTX 3060.
Direct integration with 🤗 Transformers, Optimum, and PEFT for training and inference.
Marlin kernel support (as of v0.7.0) for faster int4×fp16 matrix multiplication.
Extensible modeling base: subclass BaseGPTQForCausalLM to add support for new model architectures.

Caveats

The project is officially unmaintained; the README explicitly points to GPTQModel for bug fixes and new model support.
No macOS support; Linux and Windows only, and NVIDIA Maxwell or older GPUs are unsupported.
The Triton backend is Linux-only and excludes 3-bit quantization.

Verdict Curious about how GPTQ quantization fits into the Hugging Face stack? AutoGPTQ is a readable reference implementation—just don’t start new production work here. If you need active maintenance or newer models, the README itself tells you to use GPTQModel instead.

Frequently asked

What is AutoGPTQ/AutoGPTQ?: A Hugging Face-friendly wrapper around GPTQ that squeezes LLMs into 4-bit weights for faster, smaller inference—though the maintainers now suggest you look elsewhere.
Is AutoGPTQ open source?: Yes — AutoGPTQ/AutoGPTQ is open source, released under the MIT license.
What language is AutoGPTQ written in?: AutoGPTQ/AutoGPTQ is primarily written in Python.
How popular is AutoGPTQ?: AutoGPTQ/AutoGPTQ has 5.1k stars on GitHub.
Where can I find AutoGPTQ?: AutoGPTQ/AutoGPTQ is on GitHub at https://github.com/AutoGPTQ/AutoGPTQ.