← all repositories
nicedreamzapp/claude-code-local

Claude Code without the cloud bill (or the cloud)

A drop-in local server that lets Anthropic's CLI agent run on Apple Silicon using MLX-native models, no API key required.

claude-code-local
Velocity · 7d
+37
★ / day
Trend
steady
star history

What it does

This repo wraps MLX and the ds4 engine in an Anthropic-compatible API server so claude CLI talks to a local model instead of Anthropic’s cloud. Swap one environment variable and you swap the brain — Gemma 4 31B, Qwen 3.5 122B, or DeepSeek V4 Flash. The author also sells pre-configured Mac minis and $19 agent packs, but the core stack is open-source MIT.

The interesting bit

The “abliterated” angle: they package and upload their own MLX builds to HuggingFace, stripping alignment fine-tuning so the models are more pliable for agentic coding loops. For DeepSeek V4 Flash, they bolted on Antirez’s ds4 engine (pure C + Metal) within a day of release — giving 1M-token context and persistent disk KV cache so your 25k system prompt prefills once, ever.

Key highlights

  • Qwen 3.5 122B at 65 tok/s via MLX-native MoE (only ~10B active params per token)
  • DeepSeek V4 Flash: 284B params, ~37B active, 1M context, ~32 tok/s via ds4
  • Gemma 4 31B: ~15 tok/s, fits 32 GB RAM, daily driver for 64 GB Macs
  • Drop-in claude replacement scripts: claude-ds4, Claude Local.command, etc.
  • Voice mode and browser-remote access via companion repos (Ears+Mouth, Hands, Phone)

Caveats

  • RAM requirements are steep: 96 GB for Qwen, 128 GB for DeepSeek V4 Flash, and the 2-bit GGUF still eats 81 GB disk plus cache
  • “Abliterated” means intentionally stripped safety tuning; the README treats this as a feature, but it’s a liability if you’re not airgapped
  • Benchmarks shown are self-reported with YouTube demos; no independent replication cited

Verdict

Grab this if you’re a lawyer, doctor, or contractor who actually needs airgap guarantees and already owns a maxed-out Mac Studio. Skip it if you’re on a base M3 MacBook or if your threat model doesn’t justify trading cloud audit logs for locally-stripped model alignment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.