Is TinyZero open source?

Yes — Jiayi-Pan/TinyZero is open source, released under the Apache-2.0 license.

What language is TinyZero written in?

Jiayi-Pan/TinyZero is primarily written in Python.

How popular is TinyZero?

Jiayi-Pan/TinyZero has 13.2k stars on GitHub.

Where can I find TinyZero?

Jiayi-Pan/TinyZero is on GitHub at https://github.com/Jiayi-Pan/TinyZero.

← all repositories

Jiayi-Pan/TinyZero

Watch a 3B model teach itself to think for $30

A minimal, reproducible setup that shows small language models can develop self-verification and search abilities through pure reinforcement learning.

★13.2k stars Python Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does TinyZero reproduces DeepSeek R1-Zero’s core trick—teaching a base language model reasoning through reinforcement learning alone—on toy tasks like countdown and multiplication. It runs on 1–2 GPUs and costs under $30 in compute. The project is built as a thin layer atop the veRL library, with scripts and data preprocessing to get a Qwen2.5 model training quickly.

The interesting bit The “Aha moment” here is literal: the 3B base model spontaneously develops self-verification and search strategies without any supervised fine-tuning or chain-of-thought examples. The 0.5B model, notably, fails to learn reasoning at all—suggesting a capacity threshold that is itself a useful finding.

Key highlights

Reproduces R1-Zero’s emergent reasoning on countdown and multiplication tasks
3B model learns sophisticated skills; 0.5B model does not (a built-in ablation)
Single-GPU support for models ≤1.5B, dual-GPU for 3B+
Includes instruct-model ablation with chat-template data preprocessing
Full experiment logs public on Weights & Biases

Caveats

Repository is deprecated and no longer maintained; authors direct users to upstream veRL for new RL experiments
Out-of-VRAM issues reported; gradient checkpointing may be needed
“Multiplication tasks” are mentioned but only countdown training is documented in the README

Verdict Worth a look if you want to witness emergent reasoning in a controlled, cheap setup—or if you’re skeptical that R1-Zero’s results scale down. Skip it if you need active maintenance; use veRL directly instead.

Frequently asked

What is Jiayi-Pan/TinyZero?: A minimal, reproducible setup that shows small language models can develop self-verification and search abilities through pure reinforcement learning.
Is TinyZero open source?: Yes — Jiayi-Pan/TinyZero is open source, released under the Apache-2.0 license.
What language is TinyZero written in?: Jiayi-Pan/TinyZero is primarily written in Python.
How popular is TinyZero?: Jiayi-Pan/TinyZero has 13.2k stars on GitHub.
Where can I find TinyZero?: Jiayi-Pan/TinyZero is on GitHub at https://github.com/Jiayi-Pan/TinyZero.