← all repositories
rzru/nightingale

Your music library, minus the vocals, plus a scoreboard

Nightingale turns any audio file into a karaoke machine using stem separation and whisper transcription, no Python wrangling required.

1.1k stars TypeScript Domain Apps
nightingale
Velocity · 7d
+13
★ / day
Trend
steady
star history

What it does

Nightingale is a desktop karaoke app that ingests your local music folder, Jellyfin, or Navidrome library; strips out lead vocals with ML models (UVR Karaoke or Demucs); transcribes lyrics with word-level timing via WhisperX; and serves it all back with pitch scoring, key/tempo shifts, and reactive backgrounds. It ships as a single binary that bootstraps its own Python environment and downloads models on first launch.

The interesting bit

The heavy lifting runs as a persistent local analyzer process that speaks JSON over a loopback TCP socket, so model load and CUDA init happen once per session, not per song. Results are cached by blake3 hash; re-analysis only triggers if the source changes or you decide to transpose the track. It is a small architectural choice that keeps a complex pipeline from feeling like a batch job.

Key highlights

  • Self-contained setup: ffmpeg, Python 3.10, PyTorch, and ML packages install automatically; no manual dependency hunting.
  • Multiple library sources: local folders, Jellyfin, Navidrome, or self-hosted web mode for network playback.
  • CJK support: per-character forced alignment with Hepburn, pinyin, or Revised Romanization overlays.
  • Pluggable ASR: Whisper by default, or experimental Parakeet v3 for ~25 European languages.
  • GPU shaders + video backgrounds: 10 audio-reactive GPU effects plus Pixabay video loops and source-video sync.
  • Gamepad-native UI: full navigation and playback control with a controller, adaptive scaling to 4K.

Caveats

  • macOS users must strip Gatekeeper quarantine attributes manually (xattr -cr) because the app is not Apple-signed.
  • Linux builds lack auto-updates; the in-app updater just opens the GitHub Releases page.
  • CPU analysis is glacial: 10–20 minutes per song versus 2–5 on GPU, and MPS on Apple Silicon falls back to CPU for WhisperX alignment.
  • UltraStar Deluxe and Parakeet v3 support are both marked experimental.

Verdict

Worth a look if you want a private, self-hosted karaoke rig without wiring together Demucs, Whisper, and a web frontend yourself. Skip it if you need instant analysis on underpowered hardware or a polished, code-signed macOS experience.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.