xlite-dev/Awesome-DiT-Inference
A curated list of research papers on diffusion model inference optimization covering sampling, caching, quantization, parallelism, and attention techniques.

This repository aggregates academic papers and code implementations for optimizing diffusion model inference. It covers topics like sampling acceleration, KV-cache strategies, quantization methods, distributed parallelism (Ring Attention, tensor parallelism), and attention optimization. The collection targets practitioners working with image and video generation models including Stable Diffusion, Sora, Flux, and other DiT-based architectures.