← all repositories

microsoft/vidur

A high-fidelity and extensible simulator for modeling and analyzing LLM inference system performance across different hardware configurations and workloads.

vidur
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Vidur is a large-scale LLM inference system simulator developed by Microsoft. It accurately predicts metrics like time-to-first-token, time-per-output-token, and end-to-end request latency across different model and hardware combinations. The tool supports capacity planning for finding optimal deployment configurations and enables rapid testing of research ideas such as new scheduling algorithms and speculative decoding. It operates with minimal GPU requirements, needing GPUs only for an initial profiling phase, after which all simulation runs are GPU-free.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.