Teaching a neural net to lose money on Bitcoin, faster
A Deep Q-Learning agent that trades Bitcoin using the DeepSense architecture, with Docker support and TensorBoard logging.

What it does
This project trains a reinforcement learning agent to trade Bitcoin on a per-minute basis. It uses Deep Q-Learning with three possible actions per trading unit: neutral, long, or short. The agent receives rewards based on its current position and learns to maximize accumulated returns. A Docker image is provided that pulls fresh Coinbase transaction data, preprocesses it, and spins up TensorBoard on port 6006.
The interesting bit
The Q-function approximation uses DeepSense, an architecture originally designed for sensor fusion (think accelerometer + gyroscope data), adapted here for a single time series. The author also borrows the “unrealized PnL” reward concept from prior work, with plans to add exponential decay weighting to stabilize learning — though this remains unchecked on the todo list.
Key highlights
- Preprocessing extracts 180-minute history windows from Coinbase per-minute data, filtering out gaps too short for training episodes
- Docker image includes vim, screen, and auto-fetched Bitcoin price data at
/deep-trading-agent/data/btc.csv - TensorFlow 1.1.0 implementation, adapted from existing DeepSense and DQN-tensorflow repos
- Wiki documents dataset, architecture, and reward function details
- Python 2.7 codebase (yes, really)
Caveats
- Python 2.7 and TensorFlow 1.1.0 are frozen in amber; modern environments will need the Docker image or significant migration work
- The “exponentially decayed weighted unrealized PnL” reward function — described as key to stabilizing learning — is listed as not yet implemented
- Advanced preprocessing (gap-filling to increase usable training blocks) is also marked “to be implemented”
- 588,000 raw blocks of continuous prices collapse to 887 usable blocks after filtering, suggesting the dataset is sparser than it first appears
Verdict
Worth a look if you’re studying reinforcement learning in financial time series and want a concrete, runnable baseline to dissect or modernize. Skip it if you need production-ready trading infrastructure or are allergic to legacy Python.