← all repositories

NVlabs/Fast-dLLM

Training-free inference acceleration for diffusion-based Large Language Models using KV cache and parallel decoding.

Fast-dLLM
Velocity · 7d
+2.7
★ / day
Trend
steady
star history

Fast-dLLM provides a family of techniques to accelerate diffusion-based LLMs and VLMs during inference. It implements training-free acceleration via KV cache reuse and parallel decoding strategies. The approach applies to text-only dLLMs, vision-language models (dVLMs), and vision-language-action models for autonomous driving, enabling faster generation without model fine-tuning.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.