Is MiniCPM-V open source?

Yes — OpenBMB/MiniCPM-V is open source, released under the Apache-2.0 license.

What language is MiniCPM-V written in?

OpenBMB/MiniCPM-V is primarily written in Python.

How popular is MiniCPM-V?

OpenBMB/MiniCPM-V has 26k stars on GitHub and is currently cooling off.

Where can I find MiniCPM-V?

OpenBMB/MiniCPM-V is on GitHub at https://github.com/OpenBMB/MiniCPM-V.

← all repositories

OpenBMB/MiniCPM-V

Pocket-sized multimodal AI that sees, hears, and speaks in real time

To squeeze multimodal understanding—vision, video, and even real-time speech—into models small enough to run natively on a handset.

★26k stars Python Image · Video · Audio Language Models

View on GitHub ↗

Velocity · 7d

+11

★ / day

Trend

↘cooling

star history

What it does

MiniCPM-V is a family of open-weight multimodal LLMs built for edge deployment. MiniCPM-V handles image, video, and text understanding, while MiniCPM-o adds streaming audio input and speech output for real-time, full-duplex conversation. The latest releases—MiniCPM-V 4.6 at 1.3B parameters and MiniCPM-o 4.5 at 9B parameters—target iOS, Android, and HarmonyOS, with edge adaptation code published.

The interesting bit

The 1.3B MiniCPM-V 4.6 uses an intra-ViT early compression technique that the team says cuts visual encoding computation by more than half, and it can mix 4× and 16× visual token compression rates to trade accuracy for speed per task. Meanwhile, MiniCPM-o 4.5 treats audio and video as non-blocking streams, so the model can listen and speak simultaneously rather than waiting for turn-based round trips.

Key highlights

MiniCPM-V 4.6 claims to outperform the larger Gemma4-E2B-it and deliver roughly 1.5× the token throughput of Qwen3.5-0.8B, despite its smaller footprint.
Visual token compression is configurable (4×/16× mixed), letting the same model throttle image encoding cost depending on the job.
MiniCPM-o 4.5 supports full-duplex multimodal live streaming—simultaneous video, audio, text, and speech without blocking.
The project ships open-source edge code for mobile platforms and offers a public API tier for MiniCPM-V 4.6.
MiniCPM-V 4.5 is already merged into official llama.cpp, vLLM, and LLaMA-Factory; Ollama and SGLang support is in progress.

Caveats

The real-time web demo can suffer network latency, and a Docker image for local deployment is still listed as coming soon.
Support in Ollama and SGLang for the latest models has not yet landed upstream and remains a work in progress.

Verdict

Developers building offline-capable mobile apps or low-latency vision assistants should look here; if you only need server-side batch inference, larger cloud models are likely simpler.

Frequently asked

What is OpenBMB/MiniCPM-V?: To squeeze multimodal understanding—vision, video, and even real-time speech—into models small enough to run natively on a handset.
Is MiniCPM-V open source?: Yes — OpenBMB/MiniCPM-V is open source, released under the Apache-2.0 license.
What language is MiniCPM-V written in?: OpenBMB/MiniCPM-V is primarily written in Python.
How popular is MiniCPM-V?: OpenBMB/MiniCPM-V has 26k stars on GitHub and is currently cooling off.
Where can I find MiniCPM-V?: OpenBMB/MiniCPM-V is on GitHub at https://github.com/OpenBMB/MiniCPM-V.