triton-inference-server/server
NVIDIA's open-source inference server for optimized deep learning model serving on GPU, cloud, and edge.

Velocity · 7d
+3.8
★ / day
Trend
→steady
star history
Triton Inference Server is a production inference serving platform that optimizes model deployment across GPUs, CPUs, and edge devices. It supports multiple deep learning frameworks and backends, enabling low-latency inference for machine learning models in datacenter and edge environments.