← all repositories

UMass-Embodied-AGI/3D-LLM

A large language model that accepts 3D representations (objects and scenes) as inputs for 3D visual reasoning tasks.

1.2k stars Python Language ModelsComputer Vision
3D-LLM
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

3D-LLM is the first LLM system designed to process 3D point clouds and scene data from sources like ScanNet and Objaverse. The model can perform 3D captioning, question answering, and task planning on 3D environments by encoding spatial and semantic information into the language model. It uses LAVIS as its underlying vision-language framework and provides pretrained and fine-tuned checkpoints for downstream 3D understanding tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.