kyegomez/RT-2
RT-2 is a Vision-Language-Action model that enables robots to perform tasks by translating visual observations and language instructions into physical actions.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This repository implements Robotic Transformer 2, a multi-modal model that bridges vision, language, and robot control. The model takes in visual inputs and natural language commands, then outputs executable actions for robotic manipulation tasks. It builds on transformer architectures to generalize learned behaviors to novel situations, a key advancement in embodied AI systems.