kserve/kserve
A Kubernetes-native inference platform for serving LLMs and ML models at scale.

Velocity · 7d
+2.1
★ / day
Trend
→steady
star history
KServe provides a standardized, cloud-native way to deploy and serve generative and predictive AI models on Kubernetes. It supports multiple frameworks including vLLM for optimized LLM inference, PyTorch, TensorFlow, and XGBoost. The platform offers OpenAI-compatible inference endpoints, GPU acceleration, and scales to handle enterprise AI workloads, functioning as the inference runtime layer within the Kubeflow ecosystem.