← all repositories

Aratako/Irodori-TTS

A Flow Matching-based Text-to-Speech model that generates speech using diffusion-inspired techniques with continuous latent representations.

861 stars Python Image · Video · Audio
Irodori-TTS
Velocity · 7d
+8.4
★ / day
Trend
steady
star history

Irodori-TTS implements a text-to-speech model based on flow matching, following the Echo-TTS architecture and using DACVAE continuous latents as the generation target. The model supports voice cloning and emoji-driven style control for expressive speech synthesis. It offers both a base model (500M parameters, v3) and a VoiceDesign variant (600M parameters) with multi-branch architecture.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.