raids-lab/crater
A Kubernetes-based AI development platform providing GPU resource management, containerized development environments, and workflow orchestration for training and inference.

Crater is a comprehensive AI platform built for Kubernetes that manages GPU resources and orchestrates training and inference workflows. It integrates with Ray for distributed computing, supports Jupyter Lab for interactive development, and connects to popular ML frameworks including PyTorch, TensorFlow, vLLM, and llama-factory. The platform provides workflow scheduling via Volcano, container management via buildkit and nerdctl, and includes monitoring capabilities for AI workloads.