ming024/FastSpeech2
A PyTorch implementation of Microsoft's FastSpeech 2 neural text-to-speech model for generating speech audio from text.

This repository provides a complete implementation of Microsoft’s FastSpeech 2 architecture, a neural network-based text-to-speech system. It supports multi-speaker synthesis across multiple languages (English, Mandarin) and datasets including LibriTTS and AISHELL-3. The implementation includes training pipelines and inference scripts with support for modern neural vocoders like MelGAN and HiFi-GAN to convert mel-spectrograms to waveform audio.