Teaching drones to read campus maps from three angles
A PyTorch benchmark for matching drone, satellite, and street views of 1,652 university buildings so UAVs can figure out where they are without GPS.

What it does
University-1652 is a dataset and training baseline for cross-view geo-localization. It pairs 50K+ images of 1,652 buildings across 72 universities, captured from drone, satellite, and street-level perspectives. The code trains a model to match views across these modalities—say, finding a satellite image from a drone photo, or navigating a drone back to a spot using satellite reference.
The interesting bit
The dataset is deliberately split by university: 33 schools for training, 39 held-out schools for testing. That forces models to generalize to unseen campuses rather than memorizing specific buildings. The authors also publish flight-path KML files and building coordinates, so you can replay drone trajectories in Google Earth Pro.
Key highlights
- 50,218 training images across drone, street, satellite, and noisy Google street views
- Two tasks: drone-to-satellite target localization and satellite-to-drone navigation
- Supports fp16/bf16, re-ranking, GeM pooling, and multiple-query evaluation
- Pre-trained models and evaluation scripts included; works with ResNet or VGG-16 backbones
- Active workshop series (UAVM at ACM MM) with ongoing challenges through 2026
Caveats
- Dataset requires a manual request via GitHub issue (author claims ~5 min response time)
- README is a bit of a kitchen sink: workshop announcements, unrelated special issues, and deprecated torchvision warnings all pile up
- GPU memory floor is 8 GB; no CPU fallback mentioned
Verdict
Worth a look if you’re building visual localization for UAVs or working on cross-view retrieval. Skip it if you need a drop-in, no-registration dataset or if your work stays firmly in the single-view, single-modality lane.