Aratako/Irodori-TTS
A Flow Matching-based Text-to-Speech model that generates speech using diffusion-inspired techniques with continuous latent representations.

Velocity · 7d
+8.4
★ / day
Trend
→steady
star history
Irodori-TTS implements a text-to-speech model based on flow matching, following the Echo-TTS architecture and using DACVAE continuous latents as the generation target. The model supports voice cloning and emoji-driven style control for expressive speech synthesis. It offers both a base model (500M parameters, v3) and a VoiceDesign variant (600M parameters) with multi-branch architecture.