← all repositories

huggingface/speech-to-speech

A modular speech-to-speech pipeline for building local voice agents using open-source STT, LLM, and TTS models.

speech-to-speech
Velocity · 7d
+7.3
★ / day
Trend
steady
star history

This repository implements a cascaded speech-to-speech pipeline combining Voice Activity Detection, Speech-to-Text (Whisper), Language Models, and Text-to-Speech synthesis. The pipeline is built around Hugging Face Transformers and supports local deployment on various devices including Apple Silicon via MLX. It provides multiple usage modes including real-time, server/client, and WebSocket approaches, with configurable backends for each stage of the pipeline.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.