NVIDIA/FasterTransformer
NVIDIA's optimized transformer inference library for BERT and GPT models on Volta, Turing, and Ampere GPUs.

Velocity · 7d
+3.4
★ / day
Trend
→steady
star history
FasterTransformer provides highly optimized encoder and decoder transformer components for inference on NVIDIA GPUs. It supports BERT and GPT model families and integrates with PyTorch and TensorFlow. The library has transitioned development to TensorRT-LLM but remains available for existing use cases.