AlphaGo Zero crammed into an 8×8 board
A full self-play pipeline for Reversi, minus the $25M DeepMind budget.

What it does
This is a from-scratch implementation of AlphaGo Zero’s three-worker pipeline—self-play, training, and evaluation—shoehorned into learning Reversi (a.k.a. Othello). You get MCTS with policy/value networks, model promotion via head-to-head matches, and even a wxPython GUI to get humbled by your own creation. It also speaks the NBoard protocol if you want to pit it against established engines.
The interesting bit
The author actually bothered to expose the AlphaGo Zero vs. AlphaZero distinction as a config toggle: flip use_newest_next_generation_model and the evaluator worker becomes vestigial, with the freshest checkpoint always promoted. Most hobby reimplementations gloss over this; here it’s a first-class switch.
Key highlights
- Three-worker pipeline:
self(generate games),opt(train),eval(promote winners) - Pre-trained “challenge 5” model available via shell script; no GPU required for GUI play
- NBoard2.0 engine support for benchmarking against serious Reversi AIs
- Configurable MCTS parameters:
simulation_num_per_move,c_puct,dirichlet_alpha, etc. - Optional “solver” integration from a specified turn onward (exact endgame solving, presumably)
Caveats
- TensorFlow 1.3.0 and Keras 2.0.8—archaeology by current standards
- Windows setup is a slog: Python 3.5, no f-strings, Visual C++ 2015 build tools, hidden folders
- README trails off mid-sentence in the “reversi-arena” section; auto-evaluation details are incomplete
Verdict
Worth a look if you’re studying AlphaGo Zero mechanics on commodity hardware, or want a Reversi engine you can actually modify. Skip it if you need modern TF/PyTorch or a batteries-included install.