← all repositories

web-arena-x/webarena

A self-hostable web environment for building, benchmarking, and evaluating autonomous agents that navigate websites.

1.5k stars Python AgentsLLMOps · Eval
webarena
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

WebArena provides a realistic web-based testbed for developing and assessing autonomous agents. It enables parallel experiments through BrowserGym integration and supports unified leaderboard reporting across multiple web navigation benchmarks including VisualWebArena. The platform is designed to help researchers reproduce agent evaluation results reported in the associated research paper.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.