← all repositories
zhaoweicai/cascade-rcnn

Caffe-era object detection, now with extra stages

A 2018 CVPR paper that squeezes better precision out of two-stage detectors by chaining them like a quality-control assembly line.

1.1k stars C++ Computer Vision
cascade-rcnn
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This repo implements Faster R-CNN, R-FCN, FPN, and the authors’ own Cascade R-CNN in Caffe, targeting MS-COCO and PASCAL VOC. You get a menu of backbones—AlexNet, VGG, ResNet—and a pile of shell scripts to train each combination. The core claim: you can make any two-stage detector pickier about false positives by feeding its output through successive detectors with stricter IoU thresholds.

The interesting bit

The cascade isn’t just stacking identical models. Each stage sees the previous stage’s proposals and trains against a higher IoU bar, so later stages learn to rescue the borderline detections that earlier ones would have let through. The README tables show consistent gains—roughly 3–4 AP points on COCO, sometimes more on VOC—across baselines of varying strength, which is the kind of “reliable” the authors emphasize.

Key highlights

  • Ships with pretrained ResNet-50/101 FPN and ResNet-101 RFCN models for COCO and VOC
  • Benchmark tables include training times, GPU counts, and per-image inference latency (e.g., Res101-FPN-Cascade at 0.14s)
  • MATLAB wrapper required for the official evaluation demo; shell-script evaluation exists but “is not identical to the official evaluation”
  • Authors now recommend PyTorch (mmdetection) or TensorFlow (tensorpack) implementations for new work
  • FPN and roi_align were re-implemented from paper descriptions, so details may diverge from Detectron

Caveats

  • CUDA 8.0 and cuDNN 6.0.20 were tested; newer versions “should be working”—famous last words
  • Res101-FPN-Cascade training occasionally OOMs and needs resuming from solverstate
  • Only a subset of pretrained models are hosted; VGG and AlexNet checkpoints require fetching and pruning yourself

Verdict

Worth a look if you’re reproducing 2018 CVPR results or stuck maintaining a Caffe pipeline. Everyone else should probably follow the authors’ own advice and use mmdetection or tensorpack instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.