OpenHelix-Team/VLA-Adapter
A compact vision-language-action model that enables robots to perceive visual input, understand language instructions, and generate executable physical actions.

VLA-Adapter introduces an efficient paradigm for building tiny-scale vision-language-action models suitable for embodied AI applications. The system processes RGB images and language commands to predict robot action sequences in real-time. It is designed for deployment on physical robot platforms including the ALOHA robot arm and Cobot Magic collaborative robot, supporting both simulation and real-world robotic manipulation tasks.