← all repositories

StonyBrookNLP/appworld

AppWorld is a controllable environment for benchmarking function-calling and interactive coding AI agents.

appworld
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

AppWorld provides a simulated world of applications and people to evaluate how well LLM-based agents can use tools, call functions, and write code interactively. It serves as a standardized benchmark for measuring the performance of coding agents and function-calling systems. The platform includes a task explorer, API explorer, and leaderboard for comparing different agent implementations on realistic software interaction scenarios.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.