Is dynamo open source?

Yes — ai-dynamo/dynamo is an open-source project tracked on heatdrop.

What language is dynamo written in?

ai-dynamo/dynamo is primarily written in Rust.

How popular is dynamo?

ai-dynamo/dynamo has 7.6k stars on GitHub and is currently holding steady.

Where can I find dynamo?

ai-dynamo/dynamo is on GitHub at https://github.com/ai-dynamo/dynamo.

← all repositories

ai-dynamo/dynamo

Your inference engine knows one GPU; this knows the whole rack

Dynamo exists because no single inference engine can coordinate KV caches, prefill pools, and autoscaling across an entire datacenter rack on its own.

★7.6k stars Rust Inference · Serving LLMOps · Eval

View on GitHub ↗ Homepage ↗

Velocity · 7d

+8.4

★ / day

Trend

→steady

star history

What it does

Dynamo sits above existing inference engines like vLLM, SGLang, and TensorRT-LLM, turning a scattered cluster of GPUs into a single coordinated system. It handles request routing, disaggregates prefill and decode into independently scalable pools, and can offload KV cache across GPU, CPU, SSD, and even remote storage. The stack is built in Rust for the hot path and exposes Python for extensibility, offering both standalone and Kubernetes Gateway deployment modes.

The interesting bit

The standout idea is KV-aware routing: by tracking which workers already hold relevant KV cache state, Dynamo can send follow-up requests to the right GPU and skip redundant prefill work. NVIDIA claims this cuts time-to-first-token in half on some workloads. It also treats model weights as streamable assets via ModelExpress, because waiting minutes for a new replica to warm up is a tax nobody wants to pay twice.

Key highlights

Disaggregated serving: prefill and decode run in separate, independently scalable GPU pools.
KV Block Manager offloads cache across four storage tiers (GPU → CPU → SSD → remote/blob).
SLA-driven Planner autoscales pools to meet latency targets while minimizing TCO.
Fault tolerance via canary health checks and in-flight request migration.
Supports multimodal and video generation workloads, including FastVideo and SGLang Diffusion.

Caveats

Several headline benchmarks (7× throughput, 2× TTFT) come from vendor-linked or third-party reports, not reproducible in-repo numbers.
KV Block Manager support for SGLang is marked 🚧 in the feature matrix.
The “zero-config” Kubernetes deployer is still in beta.

Verdict

If you are running LLM inference across multiple nodes and need to squeeze utilization out of an expensive GPU fleet, this is worth evaluating. If you are serving a single model on a single GPU, the README itself admits your inference engine is already enough.

Frequently asked

What is ai-dynamo/dynamo?: Dynamo exists because no single inference engine can coordinate KV caches, prefill pools, and autoscaling across an entire datacenter rack on its own.
Is dynamo open source?: Yes — ai-dynamo/dynamo is an open-source project tracked on heatdrop.
What language is dynamo written in?: ai-dynamo/dynamo is primarily written in Rust.
How popular is dynamo?: ai-dynamo/dynamo has 7.6k stars on GitHub and is currently holding steady.
Where can I find dynamo?: ai-dynamo/dynamo is on GitHub at https://github.com/ai-dynamo/dynamo.