How to win Kaggle by ignoring your own loss function
A first-place time-series forecaster that trains blind, deliberately ignores NaN losses, and ensembles 30 checkpoints from three models on one TensorFlow graph.

What it does
This is the winning solution to Kaggle’s 2017 Web Traffic Time Series Forecasting competition. It predicts future Wikipedia page views using an RNN encoder-decoder with seq2seq architecture, built in TensorFlow and heavily dependent on cuDNN for GPU acceleration.
The interesting bit
The training strategy is deliberately counterintuitive: it runs in “blind mode” with no evaluation of model performance during training, and the README calmly notes that NaN losses are “normal.” The actual trick is extreme ensembling — training three models with different seeds on a single graph, saving ten checkpoints from the final 1000 steps, then averaging 30 model weights at prediction time. It’s less machine learning, more machine stubbornness.
Key highlights
- First-place finish on a high-profile Kaggle forecasting competition
- Single-graph multi-seed training with
n_models=3and checkpoint harvesting from steps 10,500–11,500 - cuDNN-dependent RNN implementation; CPU training explicitly will not work
- Modular pipeline: feature extraction (
make_features.py), TF data preprocessing (input_pipe.py), model definition (model.py), and hyperparameter sets (hparams.py) - Multiple hyperparameter configurations available (
s32,definc,inst81, etc.)
Caveats
- Requires GPU; the README is unambiguous that CPU training fails
- Reproduction demands specific Kaggle dataset files (
key_2.csv.zip,train_2.csv.zip) that are no longer trivially accessible - Prediction loads and evaluates 30 model weights, so inference is deliberately slow
- The “blind mode” approach means you get no training feedback until submission time
Verdict
Worth studying if you’re building time-series ensembles or curious about Kaggle-winning heuristics. Skip it if you need a maintainable production pipeline; this is competition code, optimized for leaderboard position over engineering hygiene.