UMass-Embodied-AGI/3D-LLM
A large language model that accepts 3D representations (objects and scenes) as inputs for 3D visual reasoning tasks.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
3D-LLM is the first LLM system designed to process 3D point clouds and scene data from sources like ScanNet and Objaverse. The model can perform 3D captioning, question answering, and task planning on 3D environments by encoding spatial and semantic information into the language model. It uses LAVIS as its underlying vision-language framework and provides pretrained and fine-tuned checkpoints for downstream 3D understanding tasks.