← all repositories
jundot/omlx

Your Mac's menu bar now runs a local LLM server with SSD-backed memory

oMLX brings vLLM-style continuous batching and tiered KV caching to Apple Silicon, controlled from a native Swift menubar app.

omlx
Velocity · 7d
+141
★ / day
Trend
steady
star history

What it does oMLX is an inference server for Apple Silicon that runs LLMs, vision models, embeddings, and rerankers locally. It exposes OpenAI- and Anthropic-compatible APIs, includes a web admin dashboard, and is managed through a native macOS menubar app built in Swift — no terminal required. Models auto-discover from a directory, or you can download them directly through the UI.

The interesting bit The standout is the tiered KV cache: hot blocks stay in RAM, cold blocks spill to SSD in safetensors format, and prefix sharing with copy-on-write means even mid-conversation context changes preserve reusable history across requests — and across server restarts. That’s the kind of memory management usually found in datacenter-grade servers, squeezed onto a MacBook.

Key highlights

  • Native Swift/SwiftUI menubar app with Sparkle auto-updates, not Electron
  • Multi-model serving with LRU eviction, manual pinning, per-model TTL, and a memory guard that caps total usage at system RAM minus 8 GB
  • One-click integrations for Claude Code, OpenCode, Codex, Hermes Agent, Copilot, and Pi
  • Built-in benchmarking, chat UI, model downloader, and offline-capable admin dashboard in six languages
  • Tool calling with auto-detection for a dozen model families, plus MCP support

Caveats

  • Requires macOS 15.0+ and Apple Silicon; Intel Macs are out
  • The macOS app and Homebrew install are separate paths — the .dmg doesn’t include the omlx CLI
  • MCP support requires an extra pip install step, even with Homebrew

Verdict Apple Silicon developers who want local LLMs for coding assistants should look here; the SSD cache and Claude Code optimizations suggest real dogfooding. Skip it if you’re on Intel, Linux, or cloud-only.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.