← all repositories
ammaarreshi/gemma-chat

Vibe code at 30,000 feet, no Wi-Fi required

An Electron app that turns Apple Silicon Macs into fully offline coding agents using Google's Gemma 4 and Apple's MLX framework.

gemma-chat
Velocity · 7d
+28
★ / day
Trend
steady
star history

What it does

Gemma Chat is an Electron app that runs Google’s Gemma 4 model natively on Apple Silicon via Apple’s MLX framework. You describe a project in plain English, it generates multi-file HTML/CSS/JS projects, and you watch a live preview update as the model types. After a one-time ~3 GB model download, it needs no internet, no API keys, and no cloud.

The interesting bit

The agent loop streams tokens from a local MLX server, parses XML <action> blocks on the fly, and flushes partial file writes to disk every ~450 ms so the preview iframe reloads in real time. The author chose XML over JSON for tool calling because small models handle it more reliably — a pragmatic concession to local-model limitations.

Key highlights

  • Build mode with sandboxed workspace + live preview canvas; Chat mode with tool use (web search, bash, calculator, URL fetch)
  • Hot-swap between four Gemma 4 variants (1.5 GB to 18 GB) mid-conversation
  • Local speech-to-text via in-browser Whisper (transformers.js WASM)
  • Auto-provisions Python venv, installs MLX-LM, and downloads model on first launch
  • Per-conversation filesystem isolation with local HTTP server for previews

Caveats

  • macOS + Apple Silicon only; no Intel Mac or Linux/Windows support
  • The 27B and 31B models need 16 GB+ and 32 GB+ RAM respectively
  • Up to 40 agent rounds per user message — complex requests could get slow

Verdict

Grab this if you want a fully offline “vibe coding” setup on a recent Mac and don’t mind trading cloud-model smarts for privacy. Skip it if you need cross-platform support, collaborative features, or the reasoning power of frontier models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.