← all repositories

opendilab/CleanS2S

A single-file Speech-to-Speech prototype agent enabling real-time voice conversations with streaming and GPT-4o integration.

527 stars Python AgentsImage · Video · Audio
CleanS2S
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

CleanS2S is a speech-to-speech interactive agent implemented in a single file, providing high-quality and streaming bidirectional voice interactions. It leverages GPT-4o for language understanding and response generation, combines speech recognition and synthesis for audio I/O, and includes a subjective action judgement module that enables the agent to proactively initiate actions during conversations. The project demonstrates a Linguistic User Interface (LUI) paradigm for voice-based AI interaction.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.