Surveillance CV that actually ships: CMU's detection-and-tracking pipeline
A battle-tested TensorFlow pipeline for detecting and tracking people and vehicles across multiple surveillance cameras, with speed hacks layered on top.

What it does This is the open-sourced inference code behind CMU’s DIVA system, which topped the IARPA ActEv leaderboard for video activity detection. It runs Faster R-CNN (ResNet-101 + dilated CNN + FPN) for object detection, then pipes ROI features into Deep SORT for tracking. You also get multi-camera tracking with ReID, plus support for EfficientDet and Mask R-CNN if you want to swap the backbone.
The interesting bit The authors didn’t just wrap existing models—they chased inference speed with a vengeance. They added multi-image batching (~30% faster), multi-thread queuing to parallelize CPU and GPU work (another ~25%), and frozen-graph loading (~30% again). There’s even a whole SPEED.md documenting the gains. It’s the rare academic release that admits “0.125x real-time” on a 4-GPU box and then systematically attacks that problem.
Key highlights
- Ships with ActEv-trained models optimized for small objects in outdoor scenes; COCO models work better indoors
- Supports TMOT (an alternative to Deep SORT) and person/vehicle ReID across camera networks
- Frame extraction includes a warning about OpenCV silently dropping duplicate frames in AVI files—use MoviePy or convert to MP4
- Tested on TensorFlow 1.15 with Python 2/3 compatibility; includes scripts for visualization, evaluation, and frame extraction
- Some model links were updated in 2022 after CMU server shutdowns; v4-v6 models are noted as unverified
Caveats
- Several model links broke when CMU shut down servers; the author has patched some but not all
- v4-v6 models lack a test set with ground truth, so you’re on your own for validation
- The README trails off mid-command in the TMOT section, suggesting the docs may not be fully maintained
Verdict Grab this if you’re building a production surveillance pipeline and need a proven starting point with detection, tracking, and multi-camera ReID already wired together. Skip it if you need a clean, modern PyTorch codebase—this is TensorFlow 1.x territory with academic-code ergonomics.