Is AngelSlim open source?

Yes — Tencent/AngelSlim is an open-source project tracked on heatdrop.

What language is AngelSlim written in?

Tencent/AngelSlim is primarily written in Python.

How popular is AngelSlim?

Tencent/AngelSlim has 1.5k stars on GitHub and is currently accelerating.

Where can I find AngelSlim?

Tencent/AngelSlim is on GitHub at https://github.com/Tencent/AngelSlim.

← all repositories

Tencent/AngelSlim

Compressing DeepSeek and Qwen without the usual toolchain sprawl

AngelSlim integrates quantization, speculative decoding, and distillation so you can shrink and serve massive models from a single toolkit.

★1.5k stars Python Inference · Serving LLMOps · Eval

View on GitHub ↗ Homepage ↗

Velocity · 7d

+19

★ / day

Trend

↗accelerating

star history

What it does AngelSlim is a Python toolkit that wraps mainstream and research-grade compression algorithms into one framework. It quantizes, prunes, and accelerates inference for large language models, vision-language models, diffusion models, and audio—covering workhorses like Qwen3, DeepSeek-R1, FLUX, and SDXL. The project also publishes pre-compressed weights and contributes kernels upstream to llama.cpp.

The interesting bit The toolkit is not merely glue code; it ships novel methods like Sherry and Tequila for sub-2-bit quantization, plus speculative-decoding variants such as DFlare that claim up to 5.52× end-to-end speedup. It also trains and deploys Eagle3 draft models across modalities, and even ships an offline Android demo of a 1.25-bit translation model.

Key highlights

Supports FP8, INT8, INT4, NVFP4, ternary, and 1.25-bit quantization through a unified API.
Bundles speculative-decoding engines: Eagle3, SpecExit, and DFlare with layer-wise fusion.
Claims to fit 235B-parameter models like Qwen3-235B and DeepSeek-R1 on a single GPU after compression.
Publishes ready-to-use quantized weights on Hugging Face and ModelScope.
Contributes hardware kernels back to the community, including an STQ1_0 1.25-bit kernel PR for llama.cpp.

Caveats

Many documentation links point to Chinese-language pages, so non-Chinese speakers may need a translator.
The README is dominated by Tencent’s own model families, making it unclear how smoothly custom or non-listed architectures work.

Verdict Grab this if you want a batteries-included compression pipeline for production inference on popular open models. Skip it if you are looking for a minimal, research-only quantization library or need broad support for niche architectures outside the Tencent/Qwen ecosystem.

Frequently asked

What is Tencent/AngelSlim?: AngelSlim integrates quantization, speculative decoding, and distillation so you can shrink and serve massive models from a single toolkit.
Is AngelSlim open source?: Yes — Tencent/AngelSlim is an open-source project tracked on heatdrop.
What language is AngelSlim written in?: Tencent/AngelSlim is primarily written in Python.
How popular is AngelSlim?: Tencent/AngelSlim has 1.5k stars on GitHub and is currently accelerating.
Where can I find AngelSlim?: Tencent/AngelSlim is on GitHub at https://github.com/Tencent/AngelSlim.