When AlphaGo met Jesse Livermore: a trading bot's origin story
A 2016-vintage experiment in teaching reinforcement learning to "read the tape" on stock markets, frozen in time when its author started a real RL trading company.

What it does
This repo trains DQN and policy-gradient agents to trade stocks using TensorFlow. The agent learns to hold, buy, or short across fixed-length episodes, receiving a terminal reward based on final portfolio value minus transaction costs. The author wanted to test whether an agent could learn to “read tape” — interpret price action the way old-school traders did.
The interesting bit
The README doubles as a genuine development journal, complete with the pivot from Chainer to TensorFlow because “all the cool kids even DeepMind (the gods) have started using TensorFlow.” The author also muses on why CNNs might suit price data (small input changes shouldn’t trigger trades) before sensibly settling on a two-layer feed-forward network to avoid normalization headaches. It’s refreshingly unpolished — a snapshot of someone thinking out loud while the 2016 RL hype wave was still building.
Key highlights
- Two working implementations: DQN (
dqn_model.py) and policy gradients (pg_model.py) - Episodic training design: terminal reward only, avoiding the complexity of per-step reward calculation in trading
- Includes Google Drive links to Nifty/NSE futures data for immediate reproduction
- Extensive reading list: Sutton’s RL book, David Silver’s lectures, AlphaGo papers
- Author now runs an actual RL trading company and has (understandably) abandoned support
Caveats
- The author explicitly states: “Leave other directories, I am not working on them for now” — only
tensor-reinforcement/is current - No visible test results, performance metrics, or profitability claims in the README
- Data dependencies live on Google Drive links that may rot; some point to 4shared
- The “deep thoughts” journal and Google Doc suggest this was a learning project, not a finished system
Verdict
Worth a skim if you’re researching the evolution of retail RL-for-finance experiments, or if you want to see how someone reasoned through network architecture choices in 2016. Skip it if you need production-ready trading infrastructure or current maintenance — this is a time capsule, not a product.