NVIDIA/Megatron-LM
GPU-optimized library for training large transformer models at scale with advanced parallelism strategies.

This repository contains Megatron-LM and Megatron Core, frameworks for distributed training of transformer models. Megatron-LM provides pre-configured training scripts for research teams, while Megatron Core offers composable GPU-optimized building blocks including transformer architectures, advanced parallelism strategies (tensor, pipeline, data, expert, and context parallelism), and mixed precision support (FP16, BF16, FP8, FP4) for custom training pipelines.