megvii-research/PETR
Transformer-based 3D object detection and BEV segmentation model that processes multi-camera images using position embedding transformation.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
PETR and PETRv2 are deep learning models published at ECCV2022 and ICCV2023 for 3D perception from multi-view camera images. PETR introduces position embedding transformation to encode 3D coordinate information into image features for end-to-end object detection. PETRv2 extends this with temporal modeling to leverage previous frame information and adds a segmentation query set for Bird’s Eye View segmentation tasks.