qualcomm/nexa-sdk
A high-performance on-device inference SDK for running frontier LLMs and VLMs on NPU, GPU, and CPU across Android, Windows, and Linux.

NexaSDK enables local execution of multimodal AI models including Qwen3-VL, DeepSeek-OCR, and Gemma-3n on edge devices. It provides comprehensive runtime coverage for GPU, NPU, and CPU hardware across mobile (Android/iOS), desktop (Windows/Linux), and IoT platforms. The SDK offers Python and C++ APIs with day-0 support for newly released models, targeting developers building on-device AI applications with minimal energy consumption.