← all repositories

ELS-RD/transformer-deploy

An inference server that optimizes and deploys Hugging Face Transformer models for production with up to 10X speedup.

transformer-deploy
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

Transformer-deploy provides an efficient, scalable inference server for running Hugging Face Transformer models on CPU and GPU. It leverages ONNX Runtime and Nvidia Triton inference server to accelerate model inference compared to standard PyTorch + FastAPI stacks. The tool offers single-command deployment and supports enterprise-grade production scenarios with semantic search and re-ranking use cases.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.