← all repositories

vllm-project/production-stack

A Kubernetes-native reference stack for deploying and scaling vLLM LLM inference across distributed clusters.

production-stack
Velocity · 7d
+4.7
★ / day
Trend
steady
star history

The vLLM Production Stack provides a reference implementation for deploying vLLM inference engines at scale in production. It enables scaling from single instances to distributed Kubernetes deployments without changing application code. The stack includes web-based monitoring dashboards, request routing for load distribution, and KV cache offloading to optimize inference performance across cluster-wide deployments.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.