ModelTC/LightLLM
A Python-based inference and serving framework for large language models optimized for lightweight design and high-speed performance.

Velocity · 7d
+3.9
★ / day
Trend
→steady
star history
LightLLM is a framework designed for running and serving large language models with focus on performance and scalability. It incorporates optimizations from established projects like vLLM, FasterTransformer, TGI, and FlashAttention to provide efficient inference capabilities. The framework supports various LLM architectures including GPT and LLaMA variants.