← all repositories

JonasGeiping/cramming

A research framework for training a BERT-style masked language model on a single GPU in one day.

1.4k stars Python Language ModelsML Frameworks
cramming
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

Contains code to replicate research on cramming language model pretraining into limited compute budgets. The framework re-analyzes components of the pretraining pipeline for single-GPU training scenarios and provides a modified pipeline achieving performance close to standard BERT. Supports PyTorch 2.0+ with improved checkpoints available on Hugging Face.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.