ML-GSAI/LLaDA
A diffusion-based large language model implemented in PyTorch with 8B parameter base and instruct versions plus MoE variants.

Official implementation of Large Language Diffusion Models (LLaDA), a language model that applies diffusion principles to text generation instead of images. The repository provides model weights for base and instruct variants, training code, and evaluation pipelines. Extensions include LLaDA-V for vision-language tasks and LLaDA-MoE which uses a Mixture of Experts architecture to achieve efficient inference with only ~1B active parameters.