bigscience-workshop/Megatron-DeepSpeed
A fork of Megatron-LM and Megatron-DeepSpeed for training large transformer language models at scale, used by the BigScience project.

Velocity · 7d
+0.8
★ / day
Trend
→steady
star history
This repository provides tools for training transformer language models including BERT and GPT-2 at scale. It integrates NVIDIA’s Megatron-LM with Microsoft’s DeepSpeed for distributed training optimizations. The project is the primary codebase for the BigScience collaborative research initiative focused on large language model development.