NVlabs/MambaVision
A hybrid vision backbone combining Mamba state-space models with self-attention for image classification, object detection, instance segmentation, and semantic segmentation.

MambaVision is a hierarchical vision architecture that achieves state-of-the-art accuracy and throughput by combining selective state space models (Mamba) with transformer self-attention blocks. Developed by NVIDIA Research, it serves as a universal vision backbone supporting multiple downstream tasks including image classification, object detection, instance segmentation, and semantic segmentation. The model is implemented in PyTorch with support for Hugging Face integration.