← all repositories

Tencent/AngelSlim

A large model compression toolkit supporting quantization, distillation, and speculative decoding techniques for optimized LLM inference.

AngelSlim
Velocity · 7d
+3.8
★ / day
Trend
steady
star history

AngelSlim provides model compression capabilities specifically designed for large language models. The toolkit implements techniques including quantization (1.25-bit to 4-bit), speculative decoding frameworks like DFlare and D-Cut, and knowledge distillation for full-precision and quantized models. It integrates with llama.cpp through kernel contributions and supports compression across diverse model architectures including Qwen, Hunyuan, DeepSeek, and multimodal VLMs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.