Is mlx-vlm open source?

Yes — Blaizzy/mlx-vlm is open source, released under the MIT license.

What language is mlx-vlm written in?

Blaizzy/mlx-vlm is primarily written in Python.

How popular is mlx-vlm?

Blaizzy/mlx-vlm has 5.2k stars on GitHub and is currently accelerating.

Where can I find mlx-vlm?

Blaizzy/mlx-vlm is on GitHub at https://github.com/Blaizzy/mlx-vlm.

← all repositories

Blaizzy/mlx-vlm

Vision models on Apple Silicon, now with a speed obsession

MLX-VLM crams speculative decoding, continuous batching, and KV cache quantization into a Mac-native toolkit for running multimodal models locally.

★5.2k stars Python Image · Video · Audio Inference · Serving ML Frameworks

View on GitHub ↗

Velocity · 7d

+14

★ / day

Trend

↗accelerating

star history

What it does MLX-VLM is a Python package for inference and fine-tuning of vision-language models—and their “omni” cousins that also handle audio and video—on Apple Silicon via Apple’s MLX framework. It wraps model loading, chat templating, and generation into a CLI, Python API, Gradio UI, and FastAPI server.

The interesting bit The project treats speculative decoding as a first-class feature, not an afterthought. It supports three distinct drafter families: DFlash for Qwen3.5, Google’s multi-token prediction assistant for Gemma 4, and EAGLE-3 speculators. The README even publishes measured speedups—up to 3.94× for Gemma 4 26B-A4B at block size 4—which is unusually concrete for this kind of tooling.

Key highlights

Runs fully offline on Apple Silicon with MLX-native quantization and KV cache tricks (TurboQuant, APC, continuous batching)
Supports image, audio, and combined multimodal inputs through a unified generate API
FastAPI server with per-request thinking-mode overrides and automatic prefix caching
Thinking budget controls for reasoning models like Qwen3.5, with forced exit tokens when the limit hits
Fine-tuning support and distributed inference across multiple Macs

Caveats

The “Activation Quantization (CUDA)” section name suggests some features may not be Mac-native, though details are truncated in the provided source
Model-specific docs exist for only a subset of supported architectures; the rest rely on generic behavior

Verdict Mac developers who want to run modern VLMs locally without wrestling with PyTorch MPS should grab this. If you’re on Linux with NVIDIA GPUs, the value proposition is thinner—MLX is Apple’s playground.

Frequently asked

What is Blaizzy/mlx-vlm?: MLX-VLM crams speculative decoding, continuous batching, and KV cache quantization into a Mac-native toolkit for running multimodal models locally.
Is mlx-vlm open source?: Yes — Blaizzy/mlx-vlm is open source, released under the MIT license.
What language is mlx-vlm written in?: Blaizzy/mlx-vlm is primarily written in Python.
How popular is mlx-vlm?: Blaizzy/mlx-vlm has 5.2k stars on GitHub and is currently accelerating.
Where can I find mlx-vlm?: Blaizzy/mlx-vlm is on GitHub at https://github.com/Blaizzy/mlx-vlm.