NVIDIA/TransformerEngine
NVIDIA's library for accelerating Transformer model training and inference on GPUs using FP8/FP4 precision.

Velocity · 7d
+2.5
★ / day
Trend
→steady
star history
TransformerEngine is a performance library that provides optimized transformer layers, attention backends, and fused kernels for NVIDIA GPUs including Hopper, Ada, and Blackwell architectures. It supports low-precision training and inference via FP8 and NVFP4 formats to reduce memory usage and increase throughput. It integrates with PyTorch, JAX, and major LLM frameworks including NeMo and Megatron.