When your portfolio manager is a neural network with a math degree
A 2017 deep RL framework for automated crypto portfolio management that sidesteps the usual reinforcement-learning headaches by treating rebalancing as immediate reward optimization.

What it does
PGPortfolio trains a neural network to allocate capital across a basket of assets—originally cryptocurrencies—using policy gradient methods. The whole system is configurable via JSON: network topology, training regime, input data windows. It bundles TensorBoard visualization, parallel training for hyperparameter search, and baseline financial algorithms from the OLPS toolkit for comparison.
The interesting bit
The authors found a way to make policy gradients behave more like supervised learning. Instead of bootstrapping future returns or running Monte Carlo rollouts, they optimize immediate reward regularized by transaction costs. The gradients become direct and cheap to compute—training reportedly finishes in under 30 minutes with tuned hyperparameters. The README also contains an admirably frank erratum: the arXiv v2 paper’s test period was ~30% shorter than reality, causing data leakage between asset selection and backtest periods. They fixed it here, not in the paper yet.
Key highlights
- Custom policy gradient formulation avoids expensive value-function estimation
- JSON-driven configuration for models, training, and data pipelines
- Built-in comparison against classical online portfolio selection algorithms
- Parallel training support for hyperparameter optimization
- TensorBoard integration for monitoring training dynamics
Caveats
- Dependencies lock you into TensorFlow ≥1.0.0 and tflearn—effectively a 2017 technology stack
- The authors explicitly warn that market efficiency has likely improved since 2017; the exact algorithm may no longer work
- All results are backtests on static historical data; slippage and market impact unmodeled
- Live trading disclaimer is blunt: “All trading strategies are used at your own risk”
Verdict
Worth studying if you’re building trading infrastructure and want to see how RL can be massaged into something tractable for finance. Skip it if you need production-ready, modern code—this is research archaeology with honest footnotes.