bradyz/cross_view_transformers
A transformer-based model that fuses multi-view camera images to produce semantic map-view segmentation at 45 FPS for autonomous driving.

This repository implements Cross-view Transformers, a CVPR 2022 paper that processes multiple camera perspectives (e.g., front, back, sides) and predicts semantic segmentation in a top-down map coordinate space. The model uses cross-view attention mechanisms to learn spatial relationships between image pixels and map locations. It supports nuScenes and KITTI datasets and enables real-time perception at 45 FPS with vehicle pose fusion for map construction over time.