microsoft/torchscale
Microsoft's PyTorch library for scaling Transformer architectures and developing foundation models including BitNet, RetNet, and LongNet.

TorchScale provides foundational architectures for large language models and multimodal systems. It implements research innovations such as DeepNet for stable deep scaling, Magneto for general-purpose modeling across language/vision/speech, and BitNet/RetNet as potential Transformer successors. The library focuses on training stability, modeling capability, and efficiency through sparse Mixture-of-Experts and long-context length extrapolation techniques.