basetenlabs/truss
A Python CLI framework for packaging, containerizing, and deploying ML models to production as API endpoints.

Truss provides a unified developer experience for serving ML models from any framework—transformers, diffusers, PyTorch, TensorFlow, vLLM, SGLang, or TensorRT-LLM. Users write model serving logic in Python, and Truss handles dependency management, GPU configuration, and containerization. It targets Baseten’s cloud infrastructure but can also deploy to self-managed environments, supporting live reload for fast iteration and built-in autoscaling for production workloads.