← all repositories
NVIDIA-AI-IOT/yolo_deepstream

YOLO on Jetson: the full pipeline from quantized training to DeepStream

NVIDIA's reference repo for squeezing YOLOv7 onto edge hardware without rewriting your entire stack.

yolo_deepstream
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does Four connected samples that take YOLOv4/v7 from PyTorch training through INT8 quantization to TensorRT inference and finally DeepStream deployment. The yolov7_qat folder handles Quantization-Aware Training with NVIDIA’s pytorch-quantization toolkit; tensorrt_yolov7 and tensorrt_yolov4 provide standalone C++ apps for engine benchmarking; deepstream_yolo wires the result into NVIDIA’s streaming analytics SDK with custom output-layer parsing.

The interesting bit The QAT workflow is the real meat. The repo includes explicit rules for Q&DQ node placement (rules.py) and a guidance doc on performance optimization—acknowledging that where you stick quantization nodes matters more than whether you do it at all. On Jetson AGX Orin, their INT8 QAT/PTQ engines hit 264 FPS at batch-16 versus 162 for FP16, with the README noting only a small mAP drop.

Key highlights

  • End-to-end: PyTorch QAT → ONNX export → TensorRT engine → DeepStream pipeline
  • Standalone C++ TensorRT apps for YOLOv4 and YOLOv7 with image, video, and COCO validation modes
  • DeepStream integration includes custom nvdsparsebbox_Yolo.cpp for parsing YOLO’s detection output format
  • Performance table covers Jetson Orin-X and Tesla T4 with FP16 and INT8, single and multi-stream
  • cuda-post-process vs cpu-post-process comparison shows where the bottleneck actually lives

Caveats

  • README grammar and formatting are rough; some sentences are unclear (“same performance of PTQ in TensorRT” is ambiguous)
  • No explicit license mentioned in the provided README text
  • DeepStream doesn’t support cudaGraph, so the trtexec numbers aren’t directly comparable to the streaming path

Verdict Grab this if you’re building a production YOLO pipeline on Jetson and need a working quantization reference—not a tutorial, but a working config to adapt. Skip if you just want a quick Python demo; this is C++ Makefile territory with NVIDIA SDK dependencies.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.