NVIDIA's RetinaNet: object detection with the training wheels welded on
A reference implementation that squeezes every drop of GPU performance from training to TensorRT inference, including a feature most detectors skip: rotated bounding boxes.

What it does
ODTK is NVIDIA’s take on RetinaNet, a single-stage object detector. It wraps the full lifecycle—training, fine-tuning, export, and inference—into a single odtk CLI tool. The selling point isn’t novelty; it’s integration. PyTorch for training, TensorRT for inference, DALI for data loading, Apex for mixed precision, DeepStream for video pipelines. Pick your backbone (ResNet18 to ResNet152, plus MobileNetV2) based on how much accuracy you can afford to trade for speed.
The interesting bit
Rotated bounding box support is genuinely unusual. Most detectors assume everything is axis-aligned; ODTK accepts [x, y, w, h, theta] annotations and handles the geometry. The performance table is also refreshingly specific—actual milliseconds and FPS on named hardware (V100, T4, A100), not vague “up to” claims.
Key highlights
- End-to-end GPU optimization: training (Apex/DALI) through inference (TensorRT FP16/INT8)
- Rotated bounding box detection with
--rotated-bboxflag - Pre-trained models with published mAP and latency for six backbones
- INT8 calibration with cacheable calibration tables for repeatability
- Single
odtkcommand handles train, infer, export, evaluate
Caveats
- Explicitly labeled a “research project, not an official NVIDIA product”
- Jetson deployment requires pinning to the
19.10branch (TensorRT 7 vs. JetPack 4.3 mismatch) - Docker-based workflow; running bare-metal is left as an exercise
Verdict
Worth a look if you’re building a production detection pipeline on NVIDIA hardware and want a reference that actually reaches TensorRT. Skip it if you need CPU inference, non-NVIDIA GPUs, or a framework-agnostic solution.