waybarrios/vllm-mlx
A vLLM-style inference server for Apple Silicon Macs that serves LLMs and vision-language models via OpenAI and Anthropic compatible APIs with continuous batching and MCP tool support.

The server provides unified OpenAI and Anthropic API endpoints from a single process, enabling high-throughput inference with continuous batching, paged KV cache, prefix caching, and SSD-tiered caching on Metal. It supports running LLMs, vision models, audio, and embedding models natively on Apple Silicon without conversion, and includes MCP tool calling for agentic workflows, with explicit Claude Code integration.