← all repositories

Tiiny-AI/PowerInfer

A high-speed LLM inference engine for local deployment on consumer-grade GPUs, featuring sparsity optimization and activation locality techniques.

PowerInfer
Velocity · 7d
+11
★ / day
Trend
steady
star history

PowerInfer is a specialized LLM serving engine optimized for running large language models locally on consumer-grade GPUs. It leverages activation locality and sparse model techniques to achieve high inference speeds. The project includes related models like SmallThinker and TurboSparse variants, and supports quantized model deployment for resource-constrained environments including mobile devices.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.