← all repositories

xlite-dev/Awesome-LLM-Inference

A categorized collection of research papers on LLM and VLM inference optimization with associated open-source implementations.

5.3k stars Python LearningInference · Serving
Awesome-LLM-Inference
Velocity · 7d
+5.2
★ / day
Trend
steady
star history

This repository aggregates academic and engineering papers focused on large language model inference optimization. It covers techniques including Flash Attention, Paged Attention, quantization methods (INT8/INT4), parallelism strategies, and inference runtimes such as vLLM and TensorRT-LLM. The list is organized by topic and includes links to code repositories for referenced papers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.