← all repositories
CSAILVision/places365

Pre-trained scene classifiers from MIT: 365 ways to read a room

A grab-bag of CNNs trained to label where a photo was taken, not what is in it.

2.1k stars Python Computer VisionData Tooling
places365
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

Places365 ships a collection of pre-trained convolutional neural networks (AlexNet, VGG, ResNet, DenseNet) that classify images into 365 scene categories—patio, food_court, beer_garden, and so on. The models were trained on the Places365-Standard dataset (~1.8 million images) and a larger Places365-Challenge set (~8 million). You get weights in Caffe, Torch, and PyTorch formats, plus two Python scripts: one for bare scene prediction, another that also emits indoor/outdoor labels, scene attributes, and a class activation map.

The interesting bit

The project treats “scene understanding” as distinct from object detection. A ResNet152 trained from scratch on places hits 44.82% top-1 error—useful if you need context (“this is a cafeteria”) rather than bounding boxes (“there is a chair”). The unified demo script bundles category, attribute, and CAM outputs in one go, which is more plumbing than most model zoos bother with.

Key highlights

  • Eight model architectures available, including hybrid models trained on ImageNet + Places365 (1,365 categories total)
  • PyTorch models provided, though trained on Python 2.7 + PyTorch 0.2; a GitHub issue warns of format gotchas
  • Indoor/outdoor labels and scene attribute predictions included via the unified script
  • Training script (train_placesCNN.py) and easy-format dataset tar provided if you want to retrain
  • CC BY license; citation to a 2017 IEEE TPAMI paper required

Caveats

  • Several README notes are stuck in 2016: “ResidualNet’s performance will be updated soon,” and the PyTorch models target a long-EOL stack
  • The Caffe/Torch heritage means you may spend time translating prototxts or wrestling with loadcaffe scale mismatches (0–255 vs 0–1)

Verdict

Worth a look if you need off-the-shelf scene context for legacy pipelines or research baselines. Skip it if you want modern, maintained models—today you’d probably fine-tune a CLIP or SigLIP variant instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.