huggingface/nanotron
A minimalistic library for pretraining transformer models with 3D-parallelism distributed training.

Velocity · 7d
+2.7
★ / day
Trend
→steady
star history
Nanotron provides a simple and flexible API for pretraining transformer models, particularly LLMs, on custom datasets. It is optimized for speed and scalability using 3D-parallelism techniques (combining tensor, pipeline, and data parallelism) to efficiently train large models across distributed compute resources. The library is designed to make large-scale model pretraining accessible while maintaining performance.