Is nanoT5 open source?

Yes — PiotrNawrot/nanoT5 is open source, released under the Apache-2.0 license.

What language is nanoT5 written in?

PiotrNawrot/nanoT5 is primarily written in Python.

How popular is nanoT5?

PiotrNawrot/nanoT5 has 1k stars on GitHub.

Where can I find nanoT5?

PiotrNawrot/nanoT5 is on GitHub at https://github.com/PiotrNawrot/nanoT5.

← all repositories

PiotrNawrot/nanoT5

Pre-training T5 on one GPU without the TPU tax

A PyTorch-native pipeline for pre-training T5-style models from scratch on a single A100, filling the long-standing gap for an accessible encoder-decoder baseline.

★1k stars Python ML Frameworks Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

nanoT5 provides a complete training pipeline to pre-train T5-base-v1.1 from random initialization on the English C4 corpus and fine-tune it on Super-Natural Instructions, all on a single A100 in roughly 16 hours. It deliberately skips tensor and pipeline parallelism to keep the codebase readable and small-scale friendly. The authors treat the model architecture as a solved problem—this project optimizes everything else around it.

The interesting bit

The project nearly closes the quality gap with Google’s original weights (40.7 vs. 40.9 RougeL on SNI) despite using 150× less data and one GPU instead of 1024 TPUs. The real discovery is in the optimizer: AdamW normally diverges during T5 pre-training, but the authors show that adding matrix-wise RMS learning-rate scaling makes it not only stable but slightly faster and better than the original Adafactor setup.

Key highlights

To the authors’ knowledge, the first T5 v1.1 pre-training reproduction in PyTorch; prior official implementations were JAX/Flax only.
C4 dataset streaming and preprocessing happen on-the-fly during training, so you don’t wait hours for a 300GB download before the first step.
AdamW with RMS scaling plus Cosine schedule hits 1.953 NLL on the C4 held-out set, beating the legacy Adafactor + Inverse-Square-Root configuration (1.995).
Includes a simplified T5 model implementation intended for teaching and learning.
Uses HuggingFace Accelerator, Hydra for configuration, and supports PyTorch 2.0 compile with BF16 mixed precision.

Caveats

The authors explicitly warn that if you only need inference or fine-tuning, you should use the official HuggingFace weights instead; these checkpoints are worse because of the constrained compute budget.
On-the-fly data preprocessing assumes a reasonably modern CPU and fast internet; the README notes it might bottleneck on an old CPU (< 8 cores) or a slow connection.

Verdict

Grab this if you are an academic researcher or hobbyist who needs to experiment with T5 pre-training objectives, continued pre-training, or custom datasets on commodity hardware. Skip it if you just want the best T5 weights for downstream tasks—HuggingFace already gives you those.

Frequently asked

What is PiotrNawrot/nanoT5?: A PyTorch-native pipeline for pre-training T5-style models from scratch on a single A100, filling the long-standing gap for an accessible encoder-decoder baseline.
Is nanoT5 open source?: Yes — PiotrNawrot/nanoT5 is open source, released under the Apache-2.0 license.
What language is nanoT5 written in?: PiotrNawrot/nanoT5 is primarily written in Python.
How popular is nanoT5?: PiotrNawrot/nanoT5 has 1k stars on GitHub.
Where can I find nanoT5?: PiotrNawrot/nanoT5 is on GitHub at https://github.com/PiotrNawrot/nanoT5.