← all repositories

google-ai-edge/LiteRT-LM

Google's production-ready inference runtime for running large language models on edge devices with GPU/NPU acceleration.

LiteRT-LM
Velocity · 7d
+13
★ / day
Trend
steady
star history

LiteRT-LM is an open-source C++ inference framework for deploying LLMs on edge devices including Android, iOS, Web, and IoT hardware. It provides hardware-accelerated execution via GPU and NPU backends, supports multiple model families (Gemma, Llama, Phi-4, Qwen), and includes features like speculative decoding, multi-modality inputs, and tool-use function calling for agentic workflows. The project offers APIs for Kotlin, Swift, JavaScript, and a CLI tool for cross-platform model execution.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.