← all repositories

collabora/WhisperFusion

A real-time conversational AI system combining Whisper speech-to-text with the Mistral LLM, optimized with TensorRT for low-latency voice interactions.

WhisperFusion
Velocity · 7d
+1.8
★ / day
Trend
steady
star history

WhisperFusion is a real-time speech-to-text pipeline that feeds transcribed audio directly into the Mistral Large Language Model, enabling seamless voice conversations with AI. The system uses Nvidia TensorRT-LLM to optimize both the Whisper model and the LLM for inference, and leverages torch.compile on WhisperSpeech for additional performance gains. It requires a GPU with at least 24GB RAM, typically an RTX 4090, to achieve the desired low-latency real-time performance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.