microsoft/vidur
A high-fidelity and extensible simulator for modeling and analyzing LLM inference system performance across different hardware configurations and workloads.

Vidur is a large-scale LLM inference system simulator developed by Microsoft. It accurately predicts metrics like time-to-first-token, time-per-output-token, and end-to-end request latency across different model and hardware combinations. The tool supports capacity planning for finding optimal deployment configurations and enables rapid testing of research ideas such as new scheduling algorithms and speculative decoding. It operates with minimal GPU requirements, needing GPUs only for an initial profiling phase, after which all simulation runs are GPU-free.