Claude Code without the cloud bill (or the cloud)
A drop-in local server that lets Anthropic's CLI agent run on Apple Silicon using MLX-native models, no API key required.

What it does
This repo wraps MLX and the ds4 engine in an Anthropic-compatible API server so claude CLI talks to a local model instead of Anthropic’s cloud. Swap one environment variable and you swap the brain — Gemma 4 31B, Qwen 3.5 122B, or DeepSeek V4 Flash. The author also sells pre-configured Mac minis and $19 agent packs, but the core stack is open-source MIT.
The interesting bit
The “abliterated” angle: they package and upload their own MLX builds to HuggingFace, stripping alignment fine-tuning so the models are more pliable for agentic coding loops. For DeepSeek V4 Flash, they bolted on Antirez’s ds4 engine (pure C + Metal) within a day of release — giving 1M-token context and persistent disk KV cache so your 25k system prompt prefills once, ever.
Key highlights
- Qwen 3.5 122B at 65 tok/s via MLX-native MoE (only ~10B active params per token)
- DeepSeek V4 Flash: 284B params, ~37B active, 1M context, ~32 tok/s via
ds4 - Gemma 4 31B: ~15 tok/s, fits 32 GB RAM, daily driver for 64 GB Macs
- Drop-in
claudereplacement scripts:claude-ds4,Claude Local.command, etc. - Voice mode and browser-remote access via companion repos (Ears+Mouth, Hands, Phone)
Caveats
- RAM requirements are steep: 96 GB for Qwen, 128 GB for DeepSeek V4 Flash, and the 2-bit GGUF still eats 81 GB disk plus cache
- “Abliterated” means intentionally stripped safety tuning; the README treats this as a feature, but it’s a liability if you’re not airgapped
- Benchmarks shown are self-reported with YouTube demos; no independent replication cited
Verdict
Grab this if you’re a lawyer, doctor, or contractor who actually needs airgap guarantees and already owns a maxed-out Mac Studio. Skip it if you’re on a base M3 MacBook or if your threat model doesn’t justify trading cloud audit logs for locally-stripped model alignment.