← all repositories

kubeai-project/kubeai

A Kubernetes operator that deploys and scales ML inference servers including vLLM, Ollama, FasterWhisper, and embedding models.

kubeai
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

KubeAI is an AI inference operator for Kubernetes that simplifies serving machine learning models in production. It supports large language models via vLLM and Ollama, speech-to-text via FasterWhisper, vector embeddings via Infinity, and reranking with cross-encoder models. The system provides intelligent autoscaling from zero, model caching with dynamic adapters for LoRA, and an OpenAI-compatible API interface while requiring no external dependencies like Istio or Knative.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.