ome-projects/ome
A Kubernetes operator that automates LLM deployment, GPU resource scheduling, and runtime selection for enterprise model serving.

OME (Open Model Engine) provides enterprise-grade management and serving of Large Language Models on Kubernetes. It treats models as first-class custom resources, automatically extracting architecture and parameter information from model files. The operator intelligently matches models to optimal runtimes like vLLM, SGLang, TensorRT-LLM, and Triton based on architecture scoring, while handling distributed storage, multi-format support (SafeTensors, PyTorch, TensorRT, ONNX), and GPU scheduling across clusters.