← all repositories

InternLM/lmdeploy

A toolkit for compressing, deploying, and serving large language models with GPU acceleration and quantization.

lmdeploy
Velocity · 7d
+7.2
★ / day
Trend
steady
star history

LMDeploy is an open-source toolkit focused on LLM inference and serving. It provides compression techniques including quantization (supporting symmetric and asymmetric 4-bit modes), CUDA kernel optimization, and integration with inference engines like FasterTransformer and TurboMind. The toolkit supports a wide range of LLMs including Llama, Llama2, Llama3, CodeLlama, InternLM, and Qwen families, enabling efficient deployment across hardware platforms.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.