Is LLaMA-Omni open source?

Yes — ictnlp/LLaMA-Omni is open source, released under the Apache-2.0 license.

What language is LLaMA-Omni written in?

ictnlp/LLaMA-Omni is primarily written in Python.

How popular is LLaMA-Omni?

ictnlp/LLaMA-Omni has 3.1k stars on GitHub.

Where can I find LLaMA-Omni?

ictnlp/LLaMA-Omni is on GitHub at https://github.com/ictnlp/LLaMA-Omni.

← all repositories

ictnlp/LLaMA-Omni

Llama learns to listen and speak with sub-230ms latency

This project bolts speech understanding and generation onto Llama-3.1-8B-Instruct to create a low-latency, open voice assistant.

★3.1k stars Python Image · Video · Audio Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does LLaMA-Omni is an end-to-end speech interaction model built on Llama-3.1-8B-Instruct. You speak to it; it simultaneously generates a text transcript and a spoken audio response. The authors aim for GPT-4o-level voice capability using the Llama-3.1-8B-Instruct base model.

The interesting bit The model was trained in under three days on just four GPUs, yet claims latency as low as 226 ms. It leans on existing components—Whisper for the speech encoder and a unit-based HiFi-GAN vocoder—rather than building a monolithic audio model from scratch, which keeps the hardware barrier low.

Key highlights

Built on Llama-3.1-8B-Instruct, generating both text and speech at the same time.
Reported latency as low as 226 ms for speech interaction.
Trained in less than 3 days on 4 GPUs.
Code is Apache-2.0, but the model weights are restricted to academic research only (no commercial use).
The authors have already released a follow-up, LLaMA-Omni 2, with models from 0.5B to 32B parameters.

Caveats

The Gradio demo suffers from unstable streaming audio playback; autoplay is disabled.
The model weights carry a strict non-commercial license, so production use requires contacting the authors for a separate license.
It borrows heavily from LLaVA and SLAM-LLM for its architecture and training code.

Verdict Researchers and hobbyists looking for a fully local, low-latency voice assistant should try it; anyone needing a production-ready, commercially licensed speech model should look elsewhere or negotiate a license.

Frequently asked

What is ictnlp/LLaMA-Omni?: This project bolts speech understanding and generation onto Llama-3.1-8B-Instruct to create a low-latency, open voice assistant.
Is LLaMA-Omni open source?: Yes — ictnlp/LLaMA-Omni is open source, released under the Apache-2.0 license.
What language is LLaMA-Omni written in?: ictnlp/LLaMA-Omni is primarily written in Python.
How popular is LLaMA-Omni?: ictnlp/LLaMA-Omni has 3.1k stars on GitHub.
Where can I find LLaMA-Omni?: ictnlp/LLaMA-Omni is on GitHub at https://github.com/ictnlp/LLaMA-Omni.