bytedance/ByteTransformer
ByteTransformer is a GPU-accelerated inference library optimized for BERT-like transformer models on NVIDIA GPUs.

ByteTransformer is a high-performance inference library for BERT-like transformers that provides both Python and C++ APIs with PyTorch integration. It implements architectural-aware optimizations including padding-free algorithms, QKV encoding, softmax, feed-forward networks, and multi-head attention specifically for BERT inference. The library has been deployed in production at ByteDance to serve transformer models with superior performance compared to PyTorch, TensorFlow, FasterTransformer, and DeepSpeed on A100 GPUs.