← all repositories

Tencent/TurboTransformers

A C++/Python runtime that accelerates transformer model inference (BERT, GPT2, etc.) on CPU and GPU with variable-length batching support.

1.5k stars C++ Inference · Serving
TurboTransformers
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

TurboTransformers is a high-performance inference engine for transformer models developed by Tencent. It provides fast CPU and GPU execution for encoder and decoder architectures including BERT, ALBERT, GPT2, and RoBERTa. The runtime supports variable-length inputs without preprocessing, smart batching to minimize zero-padding overhead, and integrates as a PyTorch plugin requiring minimal code changes. It offers both Python and C++ APIs and has been deployed in production serving scenarios including WeChat FAQ, sentiment analysis, and recommendation systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.