tensorflow/serving
TensorFlow Serving is an open-source system for serving trained machine learning models in production environments with versioning, batching, and gRPC/HTTP endpoints.

It provides a flexible, high-performance serving system designed for machine learning model inference in production. The system manages model lifetimes, supports versioned access to multiple models simultaneously, and includes a scheduler that batches inference requests for efficient GPU execution. TensorFlow Serving offers gRPC and HTTP inference endpoints, enables canary deployments and A/B testing of new model versions, and integrates natively with TensorFlow while remaining extensible to other ML frameworks and model types.