Tiiny-AI/PowerInfer
A high-speed LLM inference engine for local deployment on consumer-grade GPUs, featuring sparsity optimization and activation locality techniques.

Velocity · 7d
+11
★ / day
Trend
→steady
star history
PowerInfer is a specialized LLM serving engine optimized for running large language models locally on consumer-grade GPUs. It leverages activation locality and sparse model techniques to achieve high inference speeds. The project includes related models like SmallThinker and TurboSparse variants, and supports quantized model deployment for resource-constrained environments including mobile devices.