← all repositories

waybarrios/vllm-mlx

A vLLM-style inference server for Apple Silicon Macs that serves LLMs and vision-language models via OpenAI and Anthropic compatible APIs with continuous batching and MCP tool support.

vllm-mlx
Velocity · 7d
+7.1
★ / day
Trend
steady
star history

The server provides unified OpenAI and Anthropic API endpoints from a single process, enabling high-throughput inference with continuous batching, paged KV cache, prefix caching, and SSD-tiered caching on Metal. It supports running LLMs, vision models, audio, and embedding models natively on Apple Silicon without conversion, and includes MCP tool calling for agentic workflows, with explicit Claude Code integration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.