← all repositories
debpalash/OmniVoice-Studio

A desktop ElevenLabs clone that keeps your voice data off the internet

OmniVoice Studio runs voice cloning, dubbing, and dictation locally on macOS, Windows, and Linux — no API keys, no cloud, no subscription.

OmniVoice-Studio
Velocity · 7d
+111
★ / day
Trend
steady
star history

What it does

OmniVoice Studio is a desktop app for voice cloning, text-to-speech design, video dubbing, and real-time dictation. It claims 646 languages, zero-shot voice cloning from 3-second clips, and a full dubbing pipeline: YouTube URL or file → transcribe → translate → re-voice → export MP4. Everything runs locally; no API keys or accounts required.

The interesting bit

The project isn’t a single model — it’s a Tauri-based GUI that orchestrates multiple swappable backends. For TTS it bundles OmniVoice (default), CosyVoice 3, MLX-Audio, VoxCPM2, MOSS-TTS-Nano, and KittenTTS. For ASR it uses WhisperX by default with Faster-Whisper and MLX Whisper as opt-ins. The app auto-detects CUDA, Apple Silicon MPS, ROCm, or CPU, and offloads to CPU when VRAM is ≤8 GB. There’s also an MCP server so Claude or Cursor can trigger voice operations.

Key highlights

  • Multi-engine, not monolithic: Switch TTS/ASR backends in settings or via env vars; subclass TTSBackend to add an engine in ~50 lines.
  • Dictation widget: Global hotkey (⌘+⇧+Space) transcribes from any app, auto-pastes, then disappears.
  • Batch queue: Drop up to 50 videos and walk away; per-job progress bars.
  • Speaker diarization: Pyannote + WhisperX to identify who spoke when.
  • AI watermarking: Meta’s AudioSeal for invisible watermarks that survive compression.
  • Cross-platform installers: DMG, MSI, AppImage, and .deb for v0.2.7.

Caveats

  • Active beta: The README warns things may break between releases and recommends running from source for latest fixes.
  • Hugging Face token required: For some features like diarization, you need to set up a Hugging Face token (gated model access).
  • MLX-Audio is Apple Silicon only: Several engines are Linux/macOS-only or CPU-only; Windows CUDA support varies by backend.
  • License is FSL-1.1-ALv2: Free for personal use, commercial use requires a separate license.

Verdict

Worth a look if you want ElevenLabs-style voice tools without cloud lock-in or per-character billing. Skip it if you need production stability today — the beta warnings and source-build recommendation suggest it’s still rough around the edges.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.