← all repositories

bytedance/effective_transformer

Optimized inference engine for BERT that dynamically removes and restores padding values to reduce memory and computation waste on variable-length sequences.

effective_transformer
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

Effective Transformer is a CUDA-accelerated inference optimization library built on NVIDIA FasterTransformer. It addresses the inefficiency of padding variable-length sequences into uniform batch matrices by computing prefix sums of attention masks to access only valid tokens. During computation stages, padding values are dynamically removed and restored, significantly reducing execution time and memory consumption especially for large batch sizes with highly variable sequence lengths.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.