autonomousvision/transfuser
A transformer-based deep learning system for autonomous driving that fuses camera and LiDAR sensor data via attention mechanisms for end-to-end vehicle control.

TransFuser uses transformer architectures to fuse multi-modal sensor inputs (cameras, LiDAR) through attention mechanisms for autonomous driving. The system performs end-to-end imitation learning, predicting steering and throttle commands from sensor data. It extends the CVPR 2021 fusion transformer with improved sensor fusion strategies and achieves state-of-the-art results on the CARLA leaderboard for autonomous driving benchmarks.