← all repositories
rorysroes/SGX-Full-OrderBook-Tick-Data-Trading-Strategy

Teaching scikit-learn to read the market's mind, 10 seconds ahead

A Jupyter-based pipeline that extracts hand-crafted features from SGX full order-book tick data and runs a battery of ensemble classifiers to predict short-term price moves.

2.3k stars Jupyter Notebook Domain AppsOther AI
SGX-Full-OrderBook-Tick-Data-Trading-Strategy
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

The repo is a end-to-end notebook pipeline for high-frequency trading research on Singapore Exchange (SGX) tick data. It engineers two core features—“Rise Ratio” and “Depth Ratio”—from the raw limit order book, then trains and cross-validates five off-the-shelf classifiers (Random Forest, Extra Trees, AdaBoost, Gradient Boosting, SVM) to predict the next 10-second price direction. A simple backtest translates the best model’s predictions into P&L curves.

The interesting bit

The value is in the feature engineering, not model wizardry. The author distills noisy Level-2 tick data into just two interpretable ratios—one capturing bid-ask pressure, the other queue depth imbalance—then lets ensemble methods fight it out via cross-validation. It’s a clean, reproducible template for the “classical ML on microstructure” approach that predates the current LLM hype cycle.

Key highlights

  • Full order-book feature extraction: Rise Ratio and Depth Ratio from raw tick data
  • Model shootout across RandomForest, ExtraTrees, AdaBoost, GradientBoosting, and SVM
  • 10-second ahead directional prediction with cross-validation for model selection
  • Simple P&L backtest attached to show outcome, not just accuracy
  • Pure Jupyter Notebook implementation; no hidden C++ or infrastructure code

Caveats

  • README is sparse on data access: SGX tick data is not included and sourcing it yourself is non-trivial (expensive or restricted)
  • The “trading strategy” layer appears to be a basic directional signal-to-P&L translation; execution latency, slippage, and market impact are not addressed
  • No code-level documentation or module structure; this is research-grade notebook spaghetti

Verdict

Worth a look if you’re learning market microstructure feature engineering or need a baseline “classical ML vs. LOB” notebook to benchmark fancier methods against. Skip it if you need production HFT infrastructure, options/futures support, or a strategy you can run with your retail brokerage API.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.