DocF/multispectral-object-detection
A deep learning research project implementing a Transformer-based cross-modality fusion approach for object detection across RGB and thermal image modalities.

This repository implements the Cross-Modality Fusion Transformer (CFT) for multispectral object detection, combining RGB and thermal image data. The approach uses Transformer self-attention mechanisms to learn long-range dependencies and perform both intra-modality and inter-modality fusion during feature extraction. Built on YOLOv5, the method improves robustness of object detection in real-world scenarios by leveraging complementary information from different imaging modalities.