← all repositories
xinychen/transdim

Filling the gaps where traffic sensors go dark

A research collection of tensor-completion models for imputing missing spatiotemporal traffic data and forecasting from incomplete observations.

1.3k stars Jupyter Notebook Domain AppsML Frameworks
transdim
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

transdim is a research codebase that tackles a specific, unglamorous problem: traffic sensors fail, and standard forecasting models assume clean data. It provides Jupyter Notebook implementations of matrix and tensor factorization methods—both existing (BPMF, TRMF, HaLRTC) and author-proposed (BTMF, BGCP, BATF, BTTF, LRTC-TNN)—for two tasks: imputing missing values in spatiotemporal traffic datasets, and predicting future states when historical observations are incomplete.

The interesting bit

The project treats missingness as a first-class citizen rather than a preprocessing step. It formalizes three realistic missing patterns—random, non-random, and blockout—and evaluates models against each. The proposed LATC framework attempts to unify imputation and prediction in a single tensor-completion structure, which is the kind of conceptual tidiness that papers reward but production systems rarely attempt.

Key highlights

  • Nine imputation and five prediction models, all in NumPy-based Jupyter notebooks
  • Benchmarked on eight public datasets: PeMS, Guangzhou, Seattle, Hangzhou metro, NYC taxi, and others
  • Explicit handling of three missing-data mechanisms, including the nasty case where all sensors drop out simultaneously (blockout missing)
  • Author-proposed models (bolded in the README) include Bayesian tensor factorization variants and a truncated nuclear norm approach
  • Direct links to download datasets and run notebooks without additional scaffolding

Caveats

  • Pure NumPy/Notebook implementation: no pip-installable package, no CLI, no training pipeline abstraction
  • Python 3.7 badge suggests the codebase has not been refreshed recently
  • Coverage is uneven: some models only run on NYC or Pacific data, others skip Birmingham or London entirely

Verdict

Grab this if you’re writing a thesis on spatiotemporal tensor completion and need reference implementations with real traffic datasets. Skip it if you need a production-ready imputation library or modern PyTorch/TensorFlow tooling.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.