The 32k-star kitchen sink for object detection
MMDetection is where research code goes to become reproducible infrastructure.

What it does
MMDetection is a PyTorch-based toolbox that bundles object detection, instance segmentation, panoptic segmentation, and semi-supervised detection under one modular roof. It ships with 300+ pre-trained configs spanning YOLO to DETR to its own RTMDet family, plus a full training and evaluation pipeline.
The interesting bit
The project treats “modular” as a discipline, not a buzzword: every component (backbone, neck, head, loss) is swappable via config files, so you can Frankenstein a new detector without forking the universe. The recent MM-Grounding-DINO release is notable because it open-sources the training pipeline for a model whose original authors kept that part closed.
Key highlights
- Claims training speed “faster than or comparable to” Detectron2 and maskrcnn-benchmark; all bbox/mask ops run on GPU
- RTMDet hits 322 FPS (TensorRT FP16, RTX 3090) on COCO detection at 52.8 AP, per their benchmark table
- Supports rotated object detection (aerial/satellite imagery) and real-time instance segmentation out of the box
- Heavy dependencies on sister projects MMEngine (training) and MMCV (CV primitives)
- COCO 2018 challenge winning lineage; actively maintained with v3.3.0 released January 2024
Caveats
- The README is vague on exact hardware requirements beyond “PyTorch 1.8+”; you’ll need to dig into docs for multi-GPU or custom dataset specifics
- Modular means modular complexity: the config system has a learning curve, and the dependency stack (MMEngine + MMCV + PyTorch) is non-trivial to debug when versions drift
Verdict
Worth it if you’re doing detection research, need a reproducible baseline, or want to benchmark against published numbers without reimplementing half of CVPR. Skip if you just need to run inference on a single model—there are leaner wrappers for that.