bentoml/BentoML
A Python library for building and deploying model inference APIs and multi-model serving systems.

Velocity · 7d
+3.3
★ / day
Trend
→steady
star history
BentoML provides tools to turn any AI/ML model inference script into a REST API server with minimal code. It handles dependency management, Docker image generation, and deployment reproducibility through simple config files. The framework includes built-in serving optimizations such as dynamic batching and model parallelism to maximize CPU and GPU utilization for high-performance inference workloads.