← all repositories

hao-ai-lab/Consistency_LLM

Consistency Large Language Models (CLLMs) use Jacobi decoding to generate multiple tokens in parallel, dramatically reducing LLM inference latency.

Consistency_LLM
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

CLLMs are trained to map any randomly initialized n-token sequence to the same result as auto-regressive decoding in as few steps as possible. This enables efficient parallel decoding rather than the sequential token-by-token generation used in standard LLMs. The models demonstrate substantial generation speedups across various tasks while maintaining output quality.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.