← all repositories
mokemokechicken/reversi-alpha-zero

AlphaGo Zero crammed into an 8×8 board

A full self-play pipeline for Reversi, minus the $25M DeepMind budget.

686 stars Python AgentsML Frameworks
reversi-alpha-zero
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This is a from-scratch implementation of AlphaGo Zero’s three-worker pipeline—self-play, training, and evaluation—shoehorned into learning Reversi (a.k.a. Othello). You get MCTS with policy/value networks, model promotion via head-to-head matches, and even a wxPython GUI to get humbled by your own creation. It also speaks the NBoard protocol if you want to pit it against established engines.

The interesting bit

The author actually bothered to expose the AlphaGo Zero vs. AlphaZero distinction as a config toggle: flip use_newest_next_generation_model and the evaluator worker becomes vestigial, with the freshest checkpoint always promoted. Most hobby reimplementations gloss over this; here it’s a first-class switch.

Key highlights

  • Three-worker pipeline: self (generate games), opt (train), eval (promote winners)
  • Pre-trained “challenge 5” model available via shell script; no GPU required for GUI play
  • NBoard2.0 engine support for benchmarking against serious Reversi AIs
  • Configurable MCTS parameters: simulation_num_per_move, c_puct, dirichlet_alpha, etc.
  • Optional “solver” integration from a specified turn onward (exact endgame solving, presumably)

Caveats

  • TensorFlow 1.3.0 and Keras 2.0.8—archaeology by current standards
  • Windows setup is a slog: Python 3.5, no f-strings, Visual C++ 2015 build tools, hidden folders
  • README trails off mid-sentence in the “reversi-arena” section; auto-evaluation details are incomplete

Verdict

Worth a look if you’re studying AlphaGo Zero mechanics on commodity hardware, or want a Reversi engine you can actually modify. Skip it if you need modern TF/PyTorch or a batteries-included install.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.