← all repositories

UbiquitousLearning/mllm

A C++ inference engine for running multimodal large language models on mobile and edge devices.

mllm
Velocity · 7d
+1.5
★ / day
Trend
steady
star history

This repository provides a fast and lightweight inference engine designed to run multimodal LLMs on mobile and edge devices. It supports quantization methods like Rotation Quantization and integrates with backends such as QNN for NPU execution and CUDA for Jetson platforms. The project includes an Android implementation using a client-server architecture and supports models including Qwen, DeepSeek, and LLaMA variants.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.