Tencent/TurboTransformers
A C++/Python runtime that accelerates transformer model inference (BERT, GPT2, etc.) on CPU and GPU with variable-length batching support.

TurboTransformers is a high-performance inference engine for transformer models developed by Tencent. It provides fast CPU and GPU execution for encoder and decoder architectures including BERT, ALBERT, GPT2, and RoBERTa. The runtime supports variable-length inputs without preprocessing, smart batching to minimize zero-padding overhead, and integrates as a PyTorch plugin requiring minimal code changes. It offers both Python and C++ APIs and has been deployed in production serving scenarios including WeChat FAQ, sentiment analysis, and recommendation systems.