A DETR that actually runs in real time
Roboflow built a transformer-based detector and segmenter that beats YOLO variants on COCO while keeping latency low enough for production.

What it does
RF-DETR is a real-time object detection and instance segmentation model built on a DINOv2 vision transformer backbone. It comes in sizes from Nano to 2XLarge, with a single Python API for both tasks. The rfdetr package installs via pip and targets Python 3.10+.
The interesting bit
Transformer detectors have historically been accurate but sluggish. RF-DETR claims to square that circle: on COCO it outperforms YOLO11 and YOLO26 across most sizes, with detection latency as low as 2.3 ms on an NVIDIA T4 (TensorRT, FP16, batch 1). The instance segmentation variants are similarly positioned. Whether this holds on your hardware depends on your TensorRT setup, but the benchmark methodology is at least public—see roboflow/sab for reproducibility details.
Key highlights
- Detection and segmentation in one model family with a consistent API
- Apache 2.0 license for base models (N through L); XL/2XL detection models sit under a separate PML 1.0 license via
rfdetr_plus - Benchmarked against YOLO11, YOLO26, LW-DETR, and D-FINE on both COCO and Roboflow’s RF100-VL dataset
- Hugging Face Space, Colab fine-tuning notebook, and arXiv paper (2511.09554) available
- Requires Python ≥3.10
Caveats
- The XL and 2XL detection models are not Apache 2.0; check license terms before commercial use
- Source install from the
developbranch is explicitly flagged as potentially unstable
Verdict
Worth a look if you’re running object detection in production and want to escape YOLO’s licensing orbit—or if you’ve been waiting for DETR-style architectures to get fast enough to deploy. Skip if you’re married to a different framework and don’t need the accuracy edge.