microsoft/unilm
Microsoft's unified-llm repository containing research implementations for large-scale foundation models across language, vision, speech, and multimodal domains.

This repository consolidates Microsoft’s research on self-supervised pre-training for foundation models across tasks, languages, and modalities. It includes implementations of multimodal language models (Kosmos series), novel architectures like BitNet (1-bit LLMs), DeepNet (deep Transformers), RetNet, and X-MoE sparse mixture-of-experts. The project provides code for training and evaluating large-scale pre-trained models including LayoutLM for document understanding and XLM-E for multilingual models.