THUDM/SwissArmyTransformer
A PyTorch library providing shared backbone code for building and training Transformer variants including BERT, GPT, T5, GLM, CogView, and ViT.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
SwissArmyTransformer is a framework for developing custom Transformer models that share a unified architecture backbone. It supports various model architectures such as BERT, GPT, T5, GLM, CogView, and ViT through lightweight mixin components. The library integrates DeepSpeed ZeRO and model parallelism to enable efficient pretraining and fine-tuning of large-scale models ranging from 100M to 20B parameters.