Tencent/AngelSlim
A large model compression toolkit supporting quantization, distillation, and speculative decoding techniques for optimized LLM inference.

AngelSlim provides model compression capabilities specifically designed for large language models. The toolkit implements techniques including quantization (1.25-bit to 4-bit), speculative decoding frameworks like DFlare and D-Cut, and knowledge distillation for full-precision and quantized models. It integrates with llama.cpp through kernel contributions and supports compression across diverse model architectures including Qwen, Hunyuan, DeepSeek, and multimodal VLMs.