← all repositories

xlang-ai/OSWorld

OSWorld is a benchmark suite for evaluating multimodal AI agents on open-ended computer tasks in real environments.

OSWorld
Velocity · 7d
+3.0
★ / day
Trend
steady
star history

OSWorld provides a standardized evaluation framework for measuring how well AI agents (LLMs, VLMs, large action models) can complete tasks in real operating system environments. It supports benchmarking across CLI, GUI, and web interactions, covering diverse domains like coding, file management, and application control. The benchmark includes verified task instances, evaluation infrastructure, and supports parallelized evaluation through AWS integration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.