UbiquitousLearning/mllm
A C++ inference engine for running multimodal large language models on mobile and edge devices.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
This repository provides a fast and lightweight inference engine designed to run multimodal LLMs on mobile and edge devices. It supports quantization methods like Rotation Quantization and integrates with backends such as QNN for NPU execution and CUDA for Jetson platforms. The project includes an Android implementation using a client-server architecture and supports models including Qwen, DeepSeek, and LLaMA variants.