EleutherAI/gpt-neox
A library for training large-scale autoregressive language models on GPUs, built on Megatron and DeepSpeed.

GPT-NeoX is an implementation of model-parallel autoregressive transformers designed for training billion-parameter language models. It builds on NVIDIA’s Megatron-LM framework and incorporates techniques from DeepSpeed, including novel optimizations for distributed training. The library supports diverse HPC infrastructures including Slurm, MPI, and cloud platforms, and is used by researchers at academic, industry, and government labs for large-scale LLM research.