← all repositories
sjvasquez/web-traffic-forecasting

When WaveNet forecasts Wikipedia traffic instead of speech

A Kaggle competition solution that repurposes a generative audio model to predict 145,000 Wikipedia page-view time series.

666 stars Python Domain AppsML Frameworks
web-traffic-forecasting
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This repo contains a top-performing solution to Kaggle’s 2017 Web Traffic Forecasting competition. The task: predict 64 days of daily page views for ~145,000 Wikipedia articles, broken down by traffic type (all, mobile, desktop, spider). The model is a single neural network trained to minimize symmetric mean absolute percentage error across all series simultaneously.

The interesting bit The author adapted DeepMind’s WaveNet—an architecture built for generating raw audio—into a sequence-to-sequence forecaster. The twist: since WaveNet was designed for next-step prediction, errors would snowball over 64-day horizons. The fix is a non-parameter-sharing encoder-decoder setup trained to minimize loss across the full unrolled forecast, letting the decoder learn to handle its own accumulating noise.

Key highlights

  • Single model handles all 145k heterogeneous time series, no per-article customization
  • Dilated causal convolutions capture long-range temporal patterns without recurrence
  • Sequence-to-sequence training directly optimizes for multi-horizon accuracy, not just one-step-ahead
  • Includes sample forecasts with log-transformed visualizations against withheld ground truth
  • Achieved competitive results on a well-known public benchmark with documented methodology

Caveats

  • Frozen in 2017: requires Python 2.7, TensorFlow 1.3.0, and scikit-learn 0.18.1 (reproducing today will need containerization or patience)
  • Hardware requirement is stiff: 12 GB GPU recommended for training
  • README doesn’t state final competition rank or SMAPE score, so precise performance is unclear

Verdict Worth studying if you’re adapting generative architectures to forecasting or need a concrete dilated-casual-conv example. Skip if you want a maintained, production-ready pipeline—this is a competition artifact with period-accurate dependencies.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.