WebRTC for Python devs who'd rather write functions than protocols
FastRTC turns any Python function into a real-time audio or video stream, handling the signaling plumbing so you don't have to.

What it does
FastRTC wraps your Python handler in a Stream object and exposes it over WebRTC or WebSockets. You write echo(audio) or detection(image); the library handles the browser negotiation, media tracks, and turn-taking. It also bundles a Gradio UI, FastAPI mounting, and even a temporary phone number via fastphone().
The interesting bit
The ReplyOnPause handler is the quiet convenience: it detects when the user stops speaking, buffers the audio, and triggers your response logic automatically. For voice AI demos, that removes the usual “hold-to-talk” jank without you writing VAD code.
Key highlights
- One-liner Gradio UI:
stream.ui.launch()for instant browser testing - FastAPI integration:
stream.mount(app)gives you WebRTC and WebSocket endpoints - Built-in telephone bridge:
stream.fastphone()spins up a callable number (Hugging Face token required) - Optional VAD + TTS extras via
pip install "fastrtc[vad,tts]" - Cookbook spans Gemini, OpenAI, Claude, Whisper, YOLOv10, and Kyutai Moshi
Caveats
- The repo language tag says JavaScript, but the library is Python; likely a GitHub metadata quirk from the Gradio frontend components
fastphone()and some demos depend on Hugging Face infrastructure; portability beyond that stack is unclear
Verdict
Worth a look if you’re prototyping voice or video AI and want to skip the WebRTC rabbit hole. If you already run custom SFUs or need fine-grained codec control, this abstraction will feel like a straitjacket.