← all repositories

bytedance/ByteTransformer

ByteTransformer is a GPU-accelerated inference library optimized for BERT-like transformer models on NVIDIA GPUs.

ByteTransformer
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

ByteTransformer is a high-performance inference library for BERT-like transformers that provides both Python and C++ APIs with PyTorch integration. It implements architectural-aware optimizations including padding-free algorithms, QKV encoding, softmax, feed-forward networks, and multi-head attention specifically for BERT inference. The library has been deployed in production at ByteDance to serve transformer models with superior performance compared to PyTorch, TensorFlow, FasterTransformer, and DeepSpeed on A100 GPUs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.