← all repositories

vllm-project/vllm-metal

Community plugin enabling vLLM to run high-performance LLM inference on Apple Silicon Macs using MLX as the compute backend.

1.3k stars Python Inference · Serving
vllm-metal
Velocity · 7d
+7.2
★ / day
Trend
steady
star history

vLLM Metal is a hardware plugin that brings vLLM’s LLM inference capabilities to Apple Silicon Macs. It uses Apple’s MLX framework as the primary compute backend, providing unified PyTorch and MLX lowering paths. The plugin supports text-only language models and has achieved significant performance improvements, including 83x improvement in time-to-first-token and 3.6x throughput improvements in recent versions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.