horseee/Awesome-Efficient-LLM
A curated collection of papers and projects on efficient LLM techniques including quantization, pruning, knowledge distillation, and inference acceleration.

This repository maintains an organized list of research papers and open-source projects focused on making large language models more efficient. It covers areas such as network pruning, model quantization, knowledge distillation, inference acceleration, efficient architectures, and KV cache compression. The list is structured by sub-topic with separate markdown files and includes a project directory for implementations.