huangwl18/VoxPoser
VoxPoser uses large language models and vision-language models to zero-shot synthesize trajectories for robotic manipulation tasks.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
VoxPoser is a zero-shot method that leverages large language models and vision-language models to compose 3D value maps for robotic manipulation. The system generates robot trajectories from natural language commands without requiring any training data. Implementation is provided in the RLBench simulation environment, demonstrating zero-shot generalization to diverse manipulation tasks.