vllm-project/vllm-metal
Community plugin enabling vLLM to run high-performance LLM inference on Apple Silicon Macs using MLX as the compute backend.

Velocity · 7d
+7.2
★ / day
Trend
→steady
star history
vLLM Metal is a hardware plugin that brings vLLM’s LLM inference capabilities to Apple Silicon Macs. It uses Apple’s MLX framework as the primary compute backend, providing unified PyTorch and MLX lowering paths. The plugin supports text-only language models and has achieved significant performance improvements, including 83x improvement in time-to-first-token and 3.6x throughput improvements in recent versions.