wuba/dl_inference
A production-ready deep learning inference serving tool for TensorFlow, PyTorch, and Caffe models with automatic TensorRT optimization.

dl_inference is a general deep learning inference tool developed by 58同城 that enables rapid deployment of trained models from TensorFlow, PyTorch, and Caffe frameworks into production environments. It provides unified RPC service interfaces, supports both GPU and CPU deployment modes, and implements load balancing for multi-node model serving. The tool can automatically convert SavedModel and PyTorch .pth models to TensorRT format to improve inference performance.