← all repositories
huawei-noah/Speech-Backbones

Huawei's speech lab dumps three diffusion papers in one repo

A single landing pad for Grad-TTS, SPIRAL, and DiffVC — because maintaining separate repos is apparently harder than probabilistic diffusion modeling.

604 stars Jupyter Notebook Image · Video · AudioInference · Serving
Speech-Backbones
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This is Huawei Noah’s Ark Lab’s monorepo for speech research code. It currently houses three projects: Grad-TTS (a text-to-speech system using diffusion probabilistic models), SPIRAL (a self-supervised speech pre-training method), and DiffVC (a diffusion-based voice converter). Each comes with its own paper and author list, but they share the same README real estate.

The interesting bit

The diffusion obsession is notable — two of the three projects use diffusion models for generative speech tasks, which suggests the lab went deep on that particular wave before the rest of the field pivoted to flow matching. The SPIRAL paper’s angle is more unusual: it learns representations invariant to artificial perturbations, a kind of “what doesn’t kill the spectrogram makes it stronger” approach to pre-training.

Key highlights

  • Grad-TTS: ICML 2021, diffusion-based TTS with probabilistic modeling
  • SPIRAL: ICLR 2022, self-supervised learning with perturbation invariance
  • DiffVC: ICLR 2022 Oral, voice conversion with “fast maximum likelihood sampling”
  • All three include official implementations and arXiv links
  • Jupyter Notebook is the listed language, suggesting notebook-heavy demos or training workflows

Caveats

  • The README is essentially a table of contents with paper links; no installation, usage, or model weights are visible
  • No candidate images provided, and the repo itself appears to have no screenshots or architecture diagrams in the README
  • It’s unclear whether these are actively maintained or snapshot releases for paper reproducibility

Verdict

Worth a bookmark if you’re tracing the diffusion-for-speech lineage or need official Grad-TTS/DiffVC baselines for comparison. Skip it if you want turnkey training scripts or a unified framework — this is a paper-code drop, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.