A Udacity capstone that taught Q-learning to trade stocks
An educational reinforcement learning project where an agent learns to buy and sell a single stock through rewards and penalties, not explicit instructions.

What it does
This is a Python 2.7 implementation of Q-learning applied to single-stock trading. The agent experiments with actions (buy, sell, hold), receives profit/loss as reward signal, and gradually builds a policy without being told what “good” trading looks like upfront. The repo includes training, testing, and hyperparameter optimization modes, plus a detailed LaTeX report and a test notebook.
The interesting bit
The project grounds itself in market microstructure literature—citing an MIT thesis on electronic market-making and a 2014 paper on order-book price impact—rather than treating trading as a generic RL benchmark. That specificity is rare in student capstone projects.
Key highlights
- Implements tabular Q-learning with explicit state discretization for a single-asset order book environment
- Includes hyperparameter search for
k(state granularity) andgamma(discount factor) - Ships with a full academic report and a test notebook via nbviewer
- References actual finance literature, not just Sutton & Barto
- Apache 2.0 licensed
Caveats
- Requires Python 2.7 and the
bintreeslibrary, both effectively deprecated - No performance benchmarks or Sharpe ratios disclosed in the README
- Training runs take “several minutes”—scale and reproducibility unclear
Verdict
Worth a look if you’re teaching or learning RL and want a worked example with financial context. Skip it if you need production-ready trading infrastructure or modern Python; this is a well-documented student project, not a framework.