← all repositories
trailbehind/DeepOSM

Teaching neural nets to spot bad map data from space

DeepOSM uses OpenStreetMap labels to train a network that finds roads in satellite imagery—and flags where the map might be wrong.

1.3k stars Python Computer VisionML Frameworks
DeepOSM
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does DeepOSM downloads NAIP satellite imagery (1-meter resolution, RGB plus infrared) and OpenStreetMap vector data for the same area, tiles them into 64×64 chunks, and trains a TensorFlow neural net to predict whether the center 9 pixels contain a road. It then renders JPEG overlays showing predictions, labels, and “false positives” where OSM claims a road exists but the model disagrees.

The interesting bit The project inverts the usual crowdsourcing flow: instead of humans correcting maps, it trains a machine to audit the map by learning from the map itself. The default run covers ~200 km² of Delaware and reportedly hits 75–80% accuracy after about a minute of training on a single fully-connected ReLU layer—suggesting this is more proof-of-concept than production pipeline.

Key highlights

  • Data pipeline fetches NAIP tiles from a Mapbox S3 requester-pays bucket and OSM PBF extracts from Geofabrik
  • Dockerized setup with optional nvidia-docker for GPU acceleration
  • Outputs overlaid JPEGs for visual inspection of predictions vs. ground truth
  • Includes Jupyter notebook path for experimentation
  • Cites Mnih’s 2013 thesis on aerial image labeling as theoretical foundation

Caveats

  • Requires AWS credentials with a credit card on file (S3 requester-pays costs “a few cents”)
  • “Very limited test suite” per the README’s own admission
  • Single fully-connected layer is deliberately simple; accuracy ceiling is modest
  • Last meaningful commit activity appears to be 2016–2017 era (TensorFlow 1.x, nvidia-docker v1.0.1)

Verdict Worth a look if you’re building map-validation pipelines or teaching a class on geospatial ML. Skip it if you need state-of-the-art segmentation—modern alternatives like SpaceNet challenges or Sentinel Hub have moved well past this architecture.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.