Is LLaDA open source?

Yes — ML-GSAI/LLaDA is an open-source project tracked on heatdrop.

What language is LLaDA written in?

ML-GSAI/LLaDA is primarily written in Python.

How popular is LLaDA?

ML-GSAI/LLaDA has 3.9k stars on GitHub.

Where can I find LLaDA?

ML-GSAI/LLaDA is on GitHub at https://github.com/ML-GSAI/LLaDA.

← all repositories

ML-GSAI/LLaDA

An 8B-parameter challenge to left-to-right language modeling

LLaDA trains an 8B-parameter masked diffusion model from scratch to test whether strong language models must generate left-to-right.

★3.9k stars Python Language Models Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

LLaDA is a language model that swaps next-token prediction for masked diffusion. It uses standard Transformer blocks to reconstruct randomly masked text, and the authors trained an 8B-parameter version from scratch on 2.3 trillion tokens. They report that it rivals LLaMA3 8B on benchmarks and can handle instruction following and multi-turn chat through the same denoising process.

The interesting bit

The architecture is deliberately unchanged from autoregressive GPT-style models; only the probabilistic objective differs. The authors’ prior theoretical work shows masked diffusion does not need timestep embeddings injected into the Transformer, and that the training loss is an upper bound on negative log-likelihood—effectively turning a BERT-like masking task into a proper generative model.

Key highlights

The project includes evaluation hooks for lm-evaluation-harness and now supports batch inference across the base, instruct, and 1.5 variants.
An MoE spin-off, LLaDA-MoE-7B-A1B, activates roughly 1B parameters per forward pass while reportedly surpassing the dense 8B LLaDA 1.5 model.
The family has grown beyond text: LLaDA-V adds vision-language capabilities, and LLaDA 1.5 introduces preference alignment via variance-reduced policy optimization.
Pre-training stability was solid by diffusion standards: one NaN crash at 1.2T tokens, fixed by resuming with a reduced learning rate.

Caveats

Sampling is currently slower than autoregressive baselines because the model uses a fixed context length, cannot yet use KV-Cache, and reportedly needs sampling steps equal to the response length for peak performance.
The authors do not release training code or datasets; they provide guidelines and point to the sibling SMDM repository for a reference implementation.
Fine-tuning data borrowed from autoregressive pipelines causes quirks, such as the instruct model identifying itself as “Bailing” when asked who it is.

Verdict

Worth a look if you are interested in non-autoregressive language modeling or want pretrained diffusion checkpoints to probe. If you need production-ready inference speed or a turnkey training framework, this is still a research artifact.

Frequently asked

What is ML-GSAI/LLaDA?: LLaDA trains an 8B-parameter masked diffusion model from scratch to test whether strong language models must generate left-to-right.
Is LLaDA open source?: Yes — ML-GSAI/LLaDA is an open-source project tracked on heatdrop.
What language is LLaDA written in?: ML-GSAI/LLaDA is primarily written in Python.
How popular is LLaDA?: ML-GSAI/LLaDA has 3.9k stars on GitHub.
Where can I find LLaDA?: ML-GSAI/LLaDA is on GitHub at https://github.com/ML-GSAI/LLaDA.