lucidrains/robotic-transformer-pytorch
PyTorch implementation of RT1 (Robotic Transformer), a vision-language-action model for robot control.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository implements RT1 (Robotic Transformer) from Google Robotics in PyTorch. The model takes video frames and natural language instructions as input and outputs robot actions for real-world control tasks. It uses a MaxViT backbone with cross-attention conditioning and supports classifier-free guidance during inference. The architecture processes sequences of video frames combined with text instructions to generate motor actions.