Mario learns to jump with kindergarten math
A stripped-down PyTorch A3C implementation that proves you don't need 500 lines of boilerplate to teach an agent to finish World 1-1.

What it does
Trains an AI agent to play Super Mario Bros using the Asynchronous Advantage Actor-Critic (A3C) algorithm from the 2016 DeepMind paper. Multiple agents explore the game in parallel, sharing gradient updates to escape local optima faster than a lone plumber. The repo includes train.py, test.py, and a Google Drive folder of pre-trained weights.
The interesting bit The author deliberately stripped away the usual cruft—fancy preprocessing pipelines, exotic weight initializations, environment wrappers—to show that “minimal setup + correct algorithm = working agent.” The README also contains an extended “dad and kid at kindergarten” analogy that actually explains actor, critic, advantage, and asynchrony without a single equation.
Key highlights
- Pure PyTorch, no distributed-training frameworks required
- Pre-trained models reportedly clear 19 stages (up from the author’s initial 9, thanks to a community contribution)
- Dependencies are barebones: Python 3.6, PyTorch, OpenAI Gym, OpenCV, NumPy
train.pyandtest.pyare the only entry points—no config-file archaeology needed
Caveats
- The README doesn’t specify hardware requirements, training time, or how many parallel workers are used
- No code documentation or inline comments are shown; you’ll be reading the source directly
- Pre-trained weights live on Google Drive with no versioning or checksums mentioned
Verdict Grab this if you want a readable, no-magic A3C reference implementation in PyTorch. Skip it if you need production-grade reproducibility, hyperparameter sweeps, or modern alternatives like PPO.