A desktop ElevenLabs clone that keeps your voice data off the internet
OmniVoice Studio runs voice cloning, dubbing, and dictation locally on macOS, Windows, and Linux — no API keys, no cloud, no subscription.

What it does
OmniVoice Studio is a desktop app for voice cloning, text-to-speech design, video dubbing, and real-time dictation. It claims 646 languages, zero-shot voice cloning from 3-second clips, and a full dubbing pipeline: YouTube URL or file → transcribe → translate → re-voice → export MP4. Everything runs locally; no API keys or accounts required.
The interesting bit
The project isn’t a single model — it’s a Tauri-based GUI that orchestrates multiple swappable backends. For TTS it bundles OmniVoice (default), CosyVoice 3, MLX-Audio, VoxCPM2, MOSS-TTS-Nano, and KittenTTS. For ASR it uses WhisperX by default with Faster-Whisper and MLX Whisper as opt-ins. The app auto-detects CUDA, Apple Silicon MPS, ROCm, or CPU, and offloads to CPU when VRAM is ≤8 GB. There’s also an MCP server so Claude or Cursor can trigger voice operations.
Key highlights
- Multi-engine, not monolithic: Switch TTS/ASR backends in settings or via env vars; subclass
TTSBackendto add an engine in ~50 lines. - Dictation widget: Global hotkey (
⌘+⇧+Space) transcribes from any app, auto-pastes, then disappears. - Batch queue: Drop up to 50 videos and walk away; per-job progress bars.
- Speaker diarization: Pyannote + WhisperX to identify who spoke when.
- AI watermarking: Meta’s AudioSeal for invisible watermarks that survive compression.
- Cross-platform installers: DMG, MSI, AppImage, and .deb for v0.2.7.
Caveats
- Active beta: The README warns things may break between releases and recommends running from source for latest fixes.
- Hugging Face token required: For some features like diarization, you need to set up a Hugging Face token (gated model access).
- MLX-Audio is Apple Silicon only: Several engines are Linux/macOS-only or CPU-only; Windows CUDA support varies by backend.
- License is FSL-1.1-ALv2: Free for personal use, commercial use requires a separate license.
Verdict
Worth a look if you want ElevenLabs-style voice tools without cloud lock-in or per-character billing. Skip it if you need production stability today — the beta warnings and source-build recommendation suggest it’s still rough around the edges.