← all repositories
open-mmlab/Amphion

A Swiss Army knife for making computers sing, speak, and sound

Amphion bundles nearly every major audio generation task into one reproducible toolkit, with a side of architecture visualizations for the confused.

Amphion
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does Amphion is a Python toolkit that wraps up the full stack of audio generation research: text-to-speech, singing voice synthesis, voice conversion, accent conversion, singing voice conversion, and text-to-audio. It also ships vocoders, evaluation metrics, and pre-trained models like MaskGCT and Vevo2. The project explicitly targets junior researchers who need a leg up in a field that can feel like drinking from a firehose.

The interesting bit The maintainers embed visualizations of classic model architectures directly into the toolkit. It is a small touch, but it signals where Amphion sits on the spectrum: less “production API,” more “pedagogical research platform.” They also curate and release massive datasets—the Emilia family now exceeds 200,000 hours of speech data—so you can actually train the things.

Key highlights

  • Supports TTS, SVS, VC, AC, SVC, TTA, and developing TTM in one codebase
  • Ships SOTA models: MaskGCT (non-autoregressive TTS), Vevo/Vevo2 (zero-shot voice imitation), Metis (unified speech foundation model), DualCodec (low-frame-rate neural codec)
  • Provides architecture visualizations to help newcomers understand how models tick
  • Curates the Emilia and Emilia-Large datasets (101k–200k+ hours) with open preprocessing pipelines
  • MIT licensed, with models and demos hosted on HuggingFace and ModelScope

Caveats

  • The README is a firehose of news badges and model announcements; finding stable, task-specific documentation takes clicking through to sub-READMEs
  • “TTM: Text to Music” is marked as developing, so do not expect it to Just Work yet

Verdict Grab this if you are a researcher or grad student who wants to compare voice conversion techniques without rebuilding five different repos from scratch. Skip it if you need a polished, single-purpose API for a product; Amphion is a lab bench, not a SaaS.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.