ictnlp/StreamSpeech
StreamSpeech is a multi-task neural model that performs simultaneous speech-to-text and speech-to-speech translation alongside speech synthesis.

StreamSpeech is an end-to-end speech processing model that unifies automatic speech recognition, machine translation, and speech synthesis in a single architecture. It supports both offline and streaming (simultaneous) modes for speech-to-text and speech-to-speech translation tasks. The model is trained using multi-task learning and achieves state-of-the-art results on both offline and simultaneous translation benchmarks.