inclusionAI/Ming-UniAudio
Ming-UniAudio is a speech language model that unifies speech understanding, generation, and editing within a single end-to-end framework using a continuous speech tokenizer.

Ming-UniAudio is a speech language model that unifies speech understanding, generation, and editing tasks using a unified continuous speech tokenizer that bridges semantic and acoustic features in an end-to-end model. It provides a foundation model capable of both generation and understanding in the speech domain, and includes a dedicated speech editing model for natural language-guided free-form speech editing without requiring manual region specification.