← all repositories
vietnh1009/Super-mario-bros-PPO-pytorch

PPO learns Mario, almost beats Bowser

A clean PyTorch implementation of PPO that clears 31 of 32 Super Mario Bros levels, with the author admitting level 8-4 still wins.

1.3k stars Python ML FrameworksAgents
Super-mario-bros-PPO-pytorch
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Trains a reinforcement-learning agent to play Super Mario Bros using Proximal Policy Optimization (PPO) in PyTorch. You pick a world and stage, set a learning rate, and run train.py. The author provides a Dockerfile for GPU training, plus test.py to render the results to an MP4.

The interesting bit

The author previously got only 19/32 levels with A3C. Switching to PPO jumped that to 31/32 — and the README candidly admits the final missing level (8-4) is a maze puzzle the agent still can’t solve. The fix for stuck levels? Brute-force learning-rate search, including one success at 7e-5 after 70 failures.

Key highlights

  • 31 of 32 levels cleared; only the maze level 8-4 remains undefeated
  • Direct comparison to the author’s earlier A3C implementation (19/32 levels)
  • Docker support with documented rendering bug and workaround
  • Simple CLI: python train.py --world 5 --stage 2 --lr 1e-4
  • Test mode outputs MP4 videos for review

Caveats

  • Docker training requires manually commenting out env.render() to avoid a rendering bug
  • The author notes some levels need extensive learning-rate tuning (70 attempts for 1-3)
  • No code details on network architecture or reward shaping in the README

Verdict

Worth a look if you want a working, reproducible PPO baseline for NES emulation. Skip it if you need a fully general RL framework — this is tightly coupled to Mario and the gym-super-mario-bros environment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.