OthersideAI/self-operating-computer
A framework that lets multimodal AI models autonomously operate computers by viewing screens and executing mouse/keyboard actions.

This framework enables AI models to control computers using the same inputs and outputs as humans—the model observes the screen and decides on a sequence of mouse and keyboard actions to accomplish objectives. It integrates with multiple multimodal models including GPT-4o, Claude 3, Gemini Pro Vision, and others. The system uses automation tools like PyAutoGUI to execute the decided actions, making it one of the early full computer-use examples released in late 2023.