← all repositories

THU-SI/Spatial-MLLM

Spatial-MLLM enhances existing video multimodal LLMs with visual-based spatial intelligence capabilities.

467 stars Python Language Models
Spatial-MLLM
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

Spatial-MLLM is a method that significantly enhances the visual-based spatial intelligence of existing video multimodal large language models. The project provides supervised fine-tuning training code, evaluation code, and pre-trained models for spatial reasoning tasks. It achieves state-of-the-art performance on benchmarks like VSI-Bench and releases models trained on datasets such as Spatial-MLLM-120k.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.