← all repositories
gabrielmittag/NISQA

A speech quality rater that tells you *why* your call sounds awful

NISQA scores phone and synthetic speech quality across five dimensions, not just one blunt MOS number.

NISQA
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does NISQA is a deep-learning speech quality predictor that works without a clean reference track. Feed it a degraded audio file and it returns an overall quality score plus four specific culprits: Noisiness, Coloration, Discontinuity, and Loudness. A separate model variant, NISQA-TTS, rates how natural synthetic speech from TTS or voice conversion systems sounds.

The interesting bit Most quality metrics give you a single number and shrug. NISQA v2.0 breaks down why quality suffered, which is genuinely useful for debugging codecs, network glitches, or pipeline choices. It also ships as a configurable training framework—swap CNNs for LSTMs, add self-attention, go double-ended if you have reference audio—controlled entirely through YAML files.

Key highlights

  • Pre-trained weights for transmitted speech (NISQA v2.0) and synthesized speech (NISQA-TTS v1.0)
  • Single-ended and double-ended prediction modes
  • Modular architecture: CNN/DFF → Self-Attention/LSTM → various pooling strategies
  • Includes a corpus of 14,000+ labeled speech samples with real-world degradation (Zoom, Skype, mobile, packet loss)
  • Fine-tuning and transfer learning supported via CSV + YAML workflow

Caveats

  • Model weights are CC BY-NC-SA 4.0—non-commercial use only
  • The Wiki is referenced repeatedly but marked “not yet added” in the README
  • Stereo files require manual channel selection; no automatic downmixing mentioned

Verdict Worth a look if you build VoIP pipelines, evaluate TTS output, or need to train custom perceptual metrics. Skip it if you need a fully open commercial license or a polished, documented API.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.