Yes — kyegomez/RT-2 is open source, released under the MIT license.

What language is RT-2 written in?

kyegomez/RT-2 is primarily written in Python.

kyegomez/RT-2 has 581 stars on GitHub.

Where can I find RT-2?

kyegomez/RT-2 is on GitHub at https://github.com/kyegomez/RT-2.

← all repositories

kyegomez/RT-2

Turning camera frames and captions into robot commands, in PyTorch

An unofficial PyTorch rebuild of Google's RT-2 vision-language-action model.

★581 stars Python Agents Domain Apps Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

The RT2 class is a PyTorch module approximating Google’s RT-2 architecture. It feeds camera images and text captions into a PaLM-E-style backbone, concatenates the vision and language embeddings in a shared space, and emits action tokens as if they were ordinary text. In short, it tries to turn what a robot sees and hears into what it should do next.

The interesting bit

The author openly notes the architecture is “quite easy to architect,” which is both a selling point and an admission that this is largely structural glue around existing vision-language components. The honesty is refreshing: the hard part—genuine multimodal understanding—is explicitly flagged as missing.

Key highlights

Treats robotic control as a language-modeling problem by representing actions as tokens in the model’s output vocabulary.
Exposes a single RT2 PyTorch module that accepts image tensors and text token sequences.
Includes a dataset reference table matching the original paper’s web-scale and robotics data mixtures.

Caveats

The author explicitly warns that the architecture “suffers from a lack of deep understanding of both the unified multi modal representation or the individual modality representations.”
The README documents only a basic forward-pass example; it is unclear whether training scripts, pre-trained weights, or full dataset loaders are included.
The relationship between the implemented RT2 module and the full co-fine-tuning pipeline described in the paper is left vague.

Verdict

Worth a look if you want a minimal, hackable PyTorch skeleton of the RT-2 token-as-action idea. Skip it if you need a trained model, dataset loaders, or a reproducible robotics pipeline out of the box.

Frequently asked

What is kyegomez/RT-2?: An unofficial PyTorch rebuild of Google's RT-2 vision-language-action model.
Is RT-2 open source?: Yes — kyegomez/RT-2 is open source, released under the MIT license.
What language is RT-2 written in?: kyegomez/RT-2 is primarily written in Python.
How popular is RT-2?: kyegomez/RT-2 has 581 stars on GitHub.
Where can I find RT-2?: kyegomez/RT-2 is on GitHub at https://github.com/kyegomez/RT-2.