Is vllm-metal open source?

Yes — vllm-project/vllm-metal is open source, released under the Apache-2.0 license.

What language is vllm-metal written in?

vllm-project/vllm-metal is primarily written in Python.

How popular is vllm-metal?

vllm-project/vllm-metal has 1.5k stars on GitHub.

Where can I find vllm-metal?

vllm-project/vllm-metal is on GitHub at https://github.com/vllm-project/vllm-metal.

← all repositories

vllm-project/vllm-metal

vLLM breaks out of CUDA jail for Apple Silicon

A community plugin that lets vLLM run natively on Apple Silicon by wiring MLX into the inference stack, instead of leaving Mac GPUs as an afterthought.

★1.5k stars Python Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

vLLM Metal is a hardware plugin that routes vLLM’s inference through Apple’s MLX framework on Apple Silicon Macs. It replaces the usual CUDA-centric path with a Metal-native backend, unifying MLX and PyTorch under one lowering path. You get the standard vLLM CLI and API running locally on M-series chips without Rosetta or x86 emulation.

The interesting bit

Rather than treating macOS as a second-class port, the plugin re-implements the core attention backend as a unified paged varlen Metal kernel. The project also bundles an experimental Rust frontend, suggesting the maintainers are rewriting hot paths with a language that actually respects memory safety.

Key highlights

Uses MLX as the primary compute backend, not a compatibility shim.
Claims an 83× improvement in time-to-first-token and 3.6× throughput in v0.2.0 over the previous release.
Supports a growing list of text-only models; full matrix lives in the docs.
Requires native arm64 Python 3.12; Rosetta is explicitly unsupported.
Optional experimental Rust frontend (vllm-rs) available via install flag.

Caveats

Model support is currently text-only; no vision or multimodal workloads are mentioned.
The optional Rust frontend is experimental and needs a separate Rust toolchain.
Native arm64 Python 3.12 is mandatory; x86_64 and Rosetta environments are explicitly blocked.

Verdict

If you run LLMs on a Mac Studio or MacBook Pro and want vLLM’s scheduling and API without a Linux box, this is your bridge. CUDA diehards and anyone needing multimodal inference should keep walking.

Frequently asked

What is vllm-project/vllm-metal?: A community plugin that lets vLLM run natively on Apple Silicon by wiring MLX into the inference stack, instead of leaving Mac GPUs as an afterthought.
Is vllm-metal open source?: Yes — vllm-project/vllm-metal is open source, released under the Apache-2.0 license.
What language is vllm-metal written in?: vllm-project/vllm-metal is primarily written in Python.
How popular is vllm-metal?: vllm-project/vllm-metal has 1.5k stars on GitHub.
Where can I find vllm-metal?: vllm-project/vllm-metal is on GitHub at https://github.com/vllm-project/vllm-metal.