Efficient-ML/Awesome-Model-Quantization

A continuously updated collection of papers, benchmarks, and resources on model quantization techniques for compressing and accelerating neural networks and LLMs.

★2.4k stars Inference · Serving Learning ML Frameworks

View on GitHub ↗

Velocity · 7d

+0.9

★ / day

Trend

→steady

star history

This repository aggregates academic papers, benchmarks, documentation, and implementations focused on model quantization—a key optimization technique for reducing model size, memory footprint, and computational cost while maintaining accuracy. It includes organized references by year, survey papers, benchmark tools like BiBench, and implementations for binarization and LLM quantization. The repository serves as a reference resource for researchers and practitioners working on efficient deep learning inference.