ELS-RD/transformer-deploy
An inference server that optimizes and deploys Hugging Face Transformer models for production with up to 10X speedup.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
Transformer-deploy provides an efficient, scalable inference server for running Hugging Face Transformer models on CPU and GPU. It leverages ONNX Runtime and Nvidia Triton inference server to accelerate model inference compared to standard PyTorch + FastAPI stacks. The tool offers single-command deployment and supports enterprise-grade production scenarios with semantic search and re-ranking use cases.