kaito-project/kaito
A Kubernetes operator that automates LLM inference, fine-tuning, and RAG deployment using vLLM with GPU resource provisioning.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
KAITO provides a CRD-based API to simplify deployment of large language models on Kubernetes clusters. It handles automated GPU node provisioning with accurate model memory estimation and applies preset configurations for parallelism strategies including pipeline, data, and tensor parallelism. The operator supports running vLLM-based inference servers and RAG engines without requiring users to manage detailed deployment parameters.