← all repositories

kaito-project/kaito

A Kubernetes operator that automates LLM inference, fine-tuning, and RAG deployment using vLLM with GPU resource provisioning.

kaito
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

KAITO provides a CRD-based API to simplify deployment of large language models on Kubernetes clusters. It handles automated GPU node provisioning with accurate model memory estimation and applies preset configurations for parallelism strategies including pipeline, data, and tensor parallelism. The operator supports running vLLM-based inference servers and RAG engines without requiring users to manage detailed deployment parameters.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.