hao-ai-lab/LookaheadDecoding
Lookahead Decoding is an ICML 2024 paper introducing a parallel decoding algorithm that accelerates LLM inference without requiring draft models or data stores.

The project implements a parallel decoding algorithm for language model inference that breaks sequential dependency by using lookahead verification. Based on Jacobi iteration methods, it decodes multiple tokens simultaneously by predicting and verifying n-grams in parallel. The approach achieves wall-clock speedups on LLaMA-2-Chat 7B generation and supports integration with FlashAttention and sampling techniques.