Your Mac's menu bar now runs a local LLM server with SSD-backed memory
oMLX brings vLLM-style continuous batching and tiered KV caching to Apple Silicon, controlled from a native Swift menubar app.

What it does oMLX is an inference server for Apple Silicon that runs LLMs, vision models, embeddings, and rerankers locally. It exposes OpenAI- and Anthropic-compatible APIs, includes a web admin dashboard, and is managed through a native macOS menubar app built in Swift — no terminal required. Models auto-discover from a directory, or you can download them directly through the UI.
The interesting bit The standout is the tiered KV cache: hot blocks stay in RAM, cold blocks spill to SSD in safetensors format, and prefix sharing with copy-on-write means even mid-conversation context changes preserve reusable history across requests — and across server restarts. That’s the kind of memory management usually found in datacenter-grade servers, squeezed onto a MacBook.
Key highlights
- Native Swift/SwiftUI menubar app with Sparkle auto-updates, not Electron
- Multi-model serving with LRU eviction, manual pinning, per-model TTL, and a memory guard that caps total usage at system RAM minus 8 GB
- One-click integrations for Claude Code, OpenCode, Codex, Hermes Agent, Copilot, and Pi
- Built-in benchmarking, chat UI, model downloader, and offline-capable admin dashboard in six languages
- Tool calling with auto-detection for a dozen model families, plus MCP support
Caveats
- Requires macOS 15.0+ and Apple Silicon; Intel Macs are out
- The macOS app and Homebrew install are separate paths — the .dmg doesn’t include the
omlxCLI - MCP support requires an extra pip install step, even with Homebrew
Verdict Apple Silicon developers who want local LLMs for coding assistants should look here; the SSD cache and Claude Code optimizations suggest real dogfooding. Skip it if you’re on Intel, Linux, or cloud-only.