ictnlp/LLaMA-Omni
End-to-end speech-language model for low-latency speech interaction built on Llama-3.1-8B-Instruct.

Velocity · 7d
+4.9
★ / day
Trend
→steady
star history
LLaMA-Omni is a speech-language model that enables simultaneous speech input and text/speech output generation. Built upon Llama-3.1-8B-Instruct, it combines speech recognition, language understanding, and speech synthesis in an end-to-end architecture. The model supports low-latency, high-quality speech interactions following natural language instructions and generates multimodal responses.