JonasGeiping/cramming
A research framework for training a BERT-style masked language model on a single GPU in one day.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
Contains code to replicate research on cramming language model pretraining into limited compute budgets. The framework re-analyzes components of the pretraining pipeline for single-GPU training scenarios and provides a modified pipeline achieving performance close to standard BERT. Supports PyTorch 2.0+ with improved checkpoints available on Hugging Face.