← all repositories
jamiepine/voicebox

A local ElevenLabs + WisprFlow that actually runs on your laptop

Voicebox bundles seven TTS engines, Whisper dictation, and MCP agent hooks into a single Tauri app — all offline.

29.5k stars TypeScript Image · Video · AudioAgents
voicebox
Velocity · 7d
+221
★ / day
Trend
steady
star history

What it does Voicebox is a desktop AI voice studio that handles both sides of the voice loop: text-to-speech and speech-to-text. It can clone voices from a few seconds of audio, generate speech in 23 languages across seven different TTS engines, and dictate into any text field via a global hotkey. It also exposes a REST API and MCP server so Claude Code, Cursor, or Cline can speak back to you in voices you own.

The interesting bit The project doesn’t bet on one model. It ships seven TTS engines — from the 82M-parameter Kokoro to HumeAI’s 3B TADA — and lets you switch per-generation based on whether you need speed, multilingual coverage, or paralinguistic tags like [laugh]. Everything runs locally via MLX on Apple Silicon, CUDA on Windows, or PyTorch on Linux/AMD/Intel. The dictation layer is unusually polished: accessibility-verified auto-paste on macOS, atomic clipboard preservation, and an on-screen pill that mirrors the same UI agents use when they speak to you.

Key highlights

  • Seven switchable TTS engines including Qwen3-TTS, LuxTTS (~1GB VRAM, 150× realtime CPU), and Chatterbox Turbo with emotion tags
  • Zero-shot voice cloning plus 50+ curated preset voices
  • Global push-to-talk dictation with Whisper STT, optional LLM refinement of ums and false starts
  • Multi-track “Stories” editor for conversations and podcasts with version pinning per clip
  • Post-processing effects via Spotify’s Pedalboard (pitch, reverb, chorus, compression)
  • Auto-chunking with crossfade for up to 50,000 characters of continuous generation
  • Built with Tauri (Rust), not Electron

Caveats

  • Linux has no pre-built binaries yet; build from source required
  • Paralinguistic tags only work with Chatterbox Turbo; other engines read [laugh] literally
  • macOS Intel and Apple Silicon are separate downloads

Verdict Worth a look if you want cloud-quality voice I/O without the cloud bill or privacy tradeoffs. Skip it if you need a simple drop-in API — this is a full desktop application with local GPU requirements.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.