When WaveNet forecasts Wikipedia traffic instead of speech
A Kaggle competition solution that repurposes a generative audio model to predict 145,000 Wikipedia page-view time series.

What it does This repo contains a top-performing solution to Kaggle’s 2017 Web Traffic Forecasting competition. The task: predict 64 days of daily page views for ~145,000 Wikipedia articles, broken down by traffic type (all, mobile, desktop, spider). The model is a single neural network trained to minimize symmetric mean absolute percentage error across all series simultaneously.
The interesting bit The author adapted DeepMind’s WaveNet—an architecture built for generating raw audio—into a sequence-to-sequence forecaster. The twist: since WaveNet was designed for next-step prediction, errors would snowball over 64-day horizons. The fix is a non-parameter-sharing encoder-decoder setup trained to minimize loss across the full unrolled forecast, letting the decoder learn to handle its own accumulating noise.
Key highlights
- Single model handles all 145k heterogeneous time series, no per-article customization
- Dilated causal convolutions capture long-range temporal patterns without recurrence
- Sequence-to-sequence training directly optimizes for multi-horizon accuracy, not just one-step-ahead
- Includes sample forecasts with log-transformed visualizations against withheld ground truth
- Achieved competitive results on a well-known public benchmark with documented methodology
Caveats
- Frozen in 2017: requires Python 2.7, TensorFlow 1.3.0, and scikit-learn 0.18.1 (reproducing today will need containerization or patience)
- Hardware requirement is stiff: 12 GB GPU recommended for training
- README doesn’t state final competition rank or SMAPE score, so precise performance is unclear
Verdict Worth studying if you’re adapting generative architectures to forecasting or need a concrete dilated-casual-conv example. Skip if you want a maintained, production-ready pipeline—this is a competition artifact with period-accurate dependencies.