← all repositories

zhenye234/LLaSA_training

A speech synthesis model built on LLaMA architecture that generates audio from text using scaled train-time and inference-time compute.

LLaSA_training
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

LLaSA is an LLaMA-based neural text-to-speech system designed to generate natural speech from textual input. The system leverages scaled compute during both training and inference phases to improve output quality. It uses the XCodec2 codec for audio encoding and incorporates a Llama text tokenizer (e.g., Llama-3.2-1B-Instruct) for text encoding. Training supports distributed execution via torchrun or SLURM, and the project provides 160k hours of open-source tokenized speech data for training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.