← all repositories

bigscience-workshop/Megatron-DeepSpeed

A fork of Megatron-LM and Megatron-DeepSpeed for training large transformer language models at scale, used by the BigScience project.

1.4k stars Python Language ModelsML Frameworks
Megatron-DeepSpeed
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

This repository provides tools for training transformer language models including BERT and GPT-2 at scale. It integrates NVIDIA’s Megatron-LM with Microsoft’s DeepSpeed for distributed training optimizations. The project is the primary codebase for the BigScience collaborative research initiative focused on large language model development.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.