opendilab/CleanS2S
A single-file Speech-to-Speech prototype agent enabling real-time voice conversations with streaming and GPT-4o integration.

CleanS2S is a speech-to-speech interactive agent implemented in a single file, providing high-quality and streaming bidirectional voice interactions. It leverages GPT-4o for language understanding and response generation, combines speech recognition and synthesis for audio I/O, and includes a subjective action judgement module that enables the agent to proactively initiate actions during conversations. The project demonstrates a Linguistic User Interface (LUI) paradigm for voice-based AI interaction.