Teaching scikit-learn to read the market's mind, 10 seconds ahead
A Jupyter-based pipeline that extracts hand-crafted features from SGX full order-book tick data and runs a battery of ensemble classifiers to predict short-term price moves.

What it does
The repo is a end-to-end notebook pipeline for high-frequency trading research on Singapore Exchange (SGX) tick data. It engineers two core features—“Rise Ratio” and “Depth Ratio”—from the raw limit order book, then trains and cross-validates five off-the-shelf classifiers (Random Forest, Extra Trees, AdaBoost, Gradient Boosting, SVM) to predict the next 10-second price direction. A simple backtest translates the best model’s predictions into P&L curves.
The interesting bit
The value is in the feature engineering, not model wizardry. The author distills noisy Level-2 tick data into just two interpretable ratios—one capturing bid-ask pressure, the other queue depth imbalance—then lets ensemble methods fight it out via cross-validation. It’s a clean, reproducible template for the “classical ML on microstructure” approach that predates the current LLM hype cycle.
Key highlights
- Full order-book feature extraction: Rise Ratio and Depth Ratio from raw tick data
- Model shootout across RandomForest, ExtraTrees, AdaBoost, GradientBoosting, and SVM
- 10-second ahead directional prediction with cross-validation for model selection
- Simple P&L backtest attached to show outcome, not just accuracy
- Pure Jupyter Notebook implementation; no hidden C++ or infrastructure code
Caveats
- README is sparse on data access: SGX tick data is not included and sourcing it yourself is non-trivial (expensive or restricted)
- The “trading strategy” layer appears to be a basic directional signal-to-P&L translation; execution latency, slippage, and market impact are not addressed
- No code-level documentation or module structure; this is research-grade notebook spaghetti
Verdict
Worth a look if you’re learning market microstructure feature engineering or need a baseline “classical ML vs. LOB” notebook to benchmark fancier methods against. Skip it if you need production HFT infrastructure, options/futures support, or a strategy you can run with your retail brokerage API.