Yes — huawei-csl/SINQ is open source, released under the Apache-2.0 license.

What language is SINQ written in?

huawei-csl/SINQ is primarily written in Python.

huawei-csl/SINQ has 625 stars on GitHub.

Where can I find SINQ?

huawei-csl/SINQ is on GitHub at https://github.com/huawei-csl/SINQ.

huawei-csl/SINQ

Shrink 236B-parameter models in five minutes, no calibration needed

SINQ compresses large language models to low precision without calibration data, using Sinkhorn matrix normalization to keep outliers from ruining the quantization party.

★625 stars Python Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does SINQ (Sinkhorn-Normalized Quantization) is a calibration-free method for compressing large language models down to 2–8 bits. It aims to shrink memory footprints dramatically—the authors claim a 236B-parameter DeepSeek model can run on a single GPU with roughly 110 GB of memory instead of ~472 GB, with less than one point of perplexity lost on WikiText2 and C4. The project provides both a standalone implementation and a native Hugging Face Transformers integration via SinqConfig.

The interesting bit Instead of a single scale factor per weight dimension, SINQ applies separate row and column scaling, then iteratively rebalances variance using Sinkhorn matrix normalization. This spreads quantization error away from outliers rather than letting it cluster, which the authors say preserves quality even at 3-bit precision.

Key highlights

Claims to quantize Qwen3-14B in ~21 seconds and DeepSeekV2.5-236B in ~5 minutes on one GPU.
Two modes: calibration-free sinq and calibrated asinq, both supporting symmetric and asymmetric quantization including NF4.
Native Hugging Face Transformers integration, though only the calibration-free mode is supported there.
Quantized models can be saved and reloaded in sharded safetensors format without keeping the original FP weights around.
A “pre-SINQ” workflow is available for GGUF quantization.

Caveats

Speed and quality comparisons against HQQ, AWQ, and GPTQ are self-reported in the README; independent benchmarks are not shown.
The Hugging Face Transformers integration supports only the calibration-free SINQ method, not the calibrated A-SINQ variant.

Verdict Worth a look if you need to squeeze oversized LLMs onto limited GPU memory and prefer to skip calibration dataset curation. If you demand rigorous third-party benchmarks before trusting a quantizer, hold off for now.

Frequently asked

What is huawei-csl/SINQ?: SINQ compresses large language models to low precision without calibration data, using Sinkhorn matrix normalization to keep outliers from ruining the quantization party.
Is SINQ open source?: Yes — huawei-csl/SINQ is open source, released under the Apache-2.0 license.
What language is SINQ written in?: huawei-csl/SINQ is primarily written in Python.
How popular is SINQ?: huawei-csl/SINQ has 625 stars on GitHub.
Where can I find SINQ?: huawei-csl/SINQ is on GitHub at https://github.com/huawei-csl/SINQ.