Is WhisperFusion open source?

Yes — collabora/WhisperFusion is an open-source project tracked on heatdrop.

What language is WhisperFusion written in?

collabora/WhisperFusion is primarily written in Python.

How popular is WhisperFusion?

collabora/WhisperFusion has 1.6k stars on GitHub.

Where can I find WhisperFusion?

collabora/WhisperFusion is on GitHub at https://github.com/collabora/WhisperFusion.

← all repositories

collabora/WhisperFusion

A Voice-to-Voice LLM Stack for Beefy GPUs

It marries real-time speech recognition, an LLM, and speech synthesis into a single low-latency pipeline so you can talk to an AI instead of typing.

★1.6k stars Python Chat Assistants Language Models Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

WhisperFusion is integration plumbing: it combines Collabora’s WhisperLive, WhisperSpeech, and an LLM such as Mistral or Phi into a voice-driven conversation stack. Both the LLM and Whisper run as TensorRT engines, while WhisperSpeech is accelerated via torch.compile, all aimed at minimizing round-trip delay. A bundled web GUI exposes the result as a hands-free demo.

The interesting bit

The project is less about training new models and more about aggressive optimization: it uses Nvidia’s TensorRT-LLM to run the entire voice-in, voice-out loop on a single GPU—specifically an RTX 4090 with 24 GB of VRAM, which the authors treat as the reference platform. Multi-GPU is supported, but the baseline assumption is already enthusiast-grade hardware.

Key highlights

End-to-end voice pipeline: speech-to-text, LLM reasoning, and text-to-speech in one stack.
TensorRT engines for the LLM and Whisper; torch.compile for WhisperSpeech.
Packaged as a Docker Compose setup with a pre-built Web GUI.
Multi-GPU support via TensorRT-LLM for scaling performance.
Defaults to Phi-2 or Phi-3-mini, despite the intro mentioning Mistral.

Caveats

Requires at least 24 GB of GPU memory; optimal latency essentially demands RTX 4090-class FP16 throughput.
The README contains an empty NOTE section, suggesting documentation is unfinished.
The default model in the Docker setup is Phi, while the project description highlights Mistral, so the exact LLM configuration is ambiguous from the README alone.

Verdict

Worth a look if you own a high-end Nvidia GPU and want an offline, voice-driven AI assistant without assembling the pipeline yourself. Everyone else should probably wait for a lighter port.

Frequently asked

What is collabora/WhisperFusion?: It marries real-time speech recognition, an LLM, and speech synthesis into a single low-latency pipeline so you can talk to an AI instead of typing.
Is WhisperFusion open source?: Yes — collabora/WhisperFusion is an open-source project tracked on heatdrop.
What language is WhisperFusion written in?: collabora/WhisperFusion is primarily written in Python.
How popular is WhisperFusion?: collabora/WhisperFusion has 1.6k stars on GitHub.
Where can I find WhisperFusion?: collabora/WhisperFusion is on GitHub at https://github.com/collabora/WhisperFusion.