← all repositories
roboflow/rf-detr

A DETR that actually runs in real time

Roboflow built a transformer-based detector and segmenter that beats YOLO variants on COCO while keeping latency low enough for production.

7.6k stars Python Computer VisionML Frameworks
rf-detr
Velocity · 7d
+17
★ / day
Trend
steady
star history

What it does

RF-DETR is a real-time object detection and instance segmentation model built on a DINOv2 vision transformer backbone. It comes in sizes from Nano to 2XLarge, with a single Python API for both tasks. The rfdetr package installs via pip and targets Python 3.10+.

The interesting bit

Transformer detectors have historically been accurate but sluggish. RF-DETR claims to square that circle: on COCO it outperforms YOLO11 and YOLO26 across most sizes, with detection latency as low as 2.3 ms on an NVIDIA T4 (TensorRT, FP16, batch 1). The instance segmentation variants are similarly positioned. Whether this holds on your hardware depends on your TensorRT setup, but the benchmark methodology is at least public—see roboflow/sab for reproducibility details.

Key highlights

  • Detection and segmentation in one model family with a consistent API
  • Apache 2.0 license for base models (N through L); XL/2XL detection models sit under a separate PML 1.0 license via rfdetr_plus
  • Benchmarked against YOLO11, YOLO26, LW-DETR, and D-FINE on both COCO and Roboflow’s RF100-VL dataset
  • Hugging Face Space, Colab fine-tuning notebook, and arXiv paper (2511.09554) available
  • Requires Python ≥3.10

Caveats

  • The XL and 2XL detection models are not Apache 2.0; check license terms before commercial use
  • Source install from the develop branch is explicitly flagged as potentially unstable

Verdict

Worth a look if you’re running object detection in production and want to escape YOLO’s licensing orbit—or if you’ve been waiting for DETR-style architectures to get fast enough to deploy. Skip if you’re married to a different framework and don’t need the accuracy edge.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.