← all repositories
antirez/ds4

One model, one engine: antirez bets the house on DeepSeek V4

A deliberately narrow inference engine that treats your SSD as first-class KV cache real estate.

ds4
Velocity · 7d
+404
★ / day
Trend
steady
star history

What it does DwarfStar runs DeepSeek V4 Flash (and PRO, if you have 512GB) locally on Metal and CUDA. It is not a generic GGUF loader: it ships its own quantization recipes, prompt rendering, tool calling, HTTP server, and even a coding agent. The author calls it “beta quality” and means it.

The interesting bit The project treats KV cache as a “first-class disk citizen,” exploiting DeepSeek’s compressed cache and fast Mac SSDs to persist state across sessions. The 2-bit quantization is genuinely asymmetric: only routed MoE experts get squeezed, while shared experts and projections stay pristine. The README openly admits the code was built with “strong assistance from GPT 5.5” — a disclosure that doubles as a warning.

Key highlights

  • Targets 96–128GB MacBooks for Flash; 512GB for PRO
  • 1M token context window with on-disk KV cache persistence
  • Custom GGUFs with imatrix-tuned 2-bit quants; won’t run arbitrary GGUFs
  • CPU path exists only for diagnostics; macOS CPU builds currently kernel-panic the OS
  • Includes ds4-agent (alpha), speed benchmarks, and official-logit regression tests

Caveats

  • Beta quality, days-old in places; ds4-agent is alpha
  • macOS CPU inference crashes the kernel due to an Apple VM bug the author could not work around
  • PRO support is experimental; PRO GGUF generation still relies on external llama.cpp tooling
  • MTP speculative decoding is correctness-gated and currently offers “at most a slight speedup”

Verdict Worth a look if you own a loaded Mac Studio or DGX Spark and want a polished, opinionated DeepSeek V4 experience rather than wrangling generic loaders. Skip if you need broad model support, run Linux CPU-only, or flinch at AI-assisted C code.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.