← all repositories
ZhengyaoJiang/PGPortfolio

When your portfolio manager is a neural network with a math degree

A 2017 deep RL framework for automated crypto portfolio management that sidesteps the usual reinforcement-learning headaches by treating rebalancing as immediate reward optimization.

1.9k stars Python Domain AppsAgentsLLMOps · Eval
PGPortfolio
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

PGPortfolio trains a neural network to allocate capital across a basket of assets—originally cryptocurrencies—using policy gradient methods. The whole system is configurable via JSON: network topology, training regime, input data windows. It bundles TensorBoard visualization, parallel training for hyperparameter search, and baseline financial algorithms from the OLPS toolkit for comparison.

The interesting bit

The authors found a way to make policy gradients behave more like supervised learning. Instead of bootstrapping future returns or running Monte Carlo rollouts, they optimize immediate reward regularized by transaction costs. The gradients become direct and cheap to compute—training reportedly finishes in under 30 minutes with tuned hyperparameters. The README also contains an admirably frank erratum: the arXiv v2 paper’s test period was ~30% shorter than reality, causing data leakage between asset selection and backtest periods. They fixed it here, not in the paper yet.

Key highlights

  • Custom policy gradient formulation avoids expensive value-function estimation
  • JSON-driven configuration for models, training, and data pipelines
  • Built-in comparison against classical online portfolio selection algorithms
  • Parallel training support for hyperparameter optimization
  • TensorBoard integration for monitoring training dynamics

Caveats

  • Dependencies lock you into TensorFlow ≥1.0.0 and tflearn—effectively a 2017 technology stack
  • The authors explicitly warn that market efficiency has likely improved since 2017; the exact algorithm may no longer work
  • All results are backtests on static historical data; slippage and market impact unmodeled
  • Live trading disclaimer is blunt: “All trading strategies are used at your own risk”

Verdict

Worth studying if you’re building trading infrastructure and want to see how RL can be massaged into something tractable for finance. Skip it if you need production-ready, modern code—this is research archaeology with honest footnotes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.