Is OSWorld open source?

Yes — xlang-ai/OSWorld is open source, released under the Apache-2.0 license.

What language is OSWorld written in?

xlang-ai/OSWorld is primarily written in Python.

How popular is OSWorld?

xlang-ai/OSWorld has 3k stars on GitHub.

Where can I find OSWorld?

xlang-ai/OSWorld is on GitHub at https://github.com/xlang-ai/OSWorld.

← all repositories

xlang-ai/OSWorld

A benchmark that makes multimodal agents control real desktops

OSWorld tests whether vision-language models can complete open-ended tasks by actually controlling Ubuntu and Windows VMs, not mocked APIs or browser tabs.

★3k stars Python LLMOps · Eval Agents Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

OSWorld is a NeurIPS 2024 benchmark that evaluates multimodal AI agents on open-ended tasks inside real virtual machines running Ubuntu or Windows. Instead of calling sanitized APIs, the agent observes the desktop through screenshots and issues GUI actions to operate real applications like LibreOffice, browsers, and system tools. The repo provides the VM harness, task definitions, scoring logic, and baseline agents so researchers can run end-to-end evaluations of computer-use models.

The interesting bit

Most AI benchmarks keep agents inside tidy sandboxes; OSWorld drops them into full desktop environments where window managers, pop-ups, and OAuth flows behave exactly like they do for humans. The framework supports multiple backends—VMware, VirtualBox, Docker with KVM, and AWS—so you can scale from a single laptop to parallel cloud evaluation that reportedly cuts runtime to under an hour.

Key highlights

Tests on live OS instances rather than simulated or browser-only environments
Supports Ubuntu and Windows guests across VMware, VirtualBox, Docker, and AWS hosts
Ships with baseline agents, per-domain scoring, and a manual task examination tool
Runtime credential injection lets you mount secrets without baking them into VM images
Recent “OSWorld-Verified” update tightened evaluation signals and expanded AWS parallelization support

Caveats

Setup is heavy: requires hypervisors or Docker with KVM, plus downloaded VM images
Several tasks demand Google account OAuth2.0 and proxy configuration; without them scores drop artificially
macOS hosts cannot use Docker/KVM and must fall back to VMware Fusion

Verdict

Worth a look if you are building or benchmarking multimodal agents for computer automation. Skip it if you need a lightweight, API-only evaluation framework.

Frequently asked

What is xlang-ai/OSWorld?: OSWorld tests whether vision-language models can complete open-ended tasks by actually controlling Ubuntu and Windows VMs, not mocked APIs or browser tabs.
Is OSWorld open source?: Yes — xlang-ai/OSWorld is open source, released under the Apache-2.0 license.
What language is OSWorld written in?: xlang-ai/OSWorld is primarily written in Python.
How popular is OSWorld?: xlang-ai/OSWorld has 3k stars on GitHub.
Where can I find OSWorld?: xlang-ai/OSWorld is on GitHub at https://github.com/xlang-ai/OSWorld.