← all repositories
HumanCompatibleAI/overcooked_ai

A kitchen where robots learn not to bump into you

A research environment that turns the video game Overcooked into a benchmark for studying whether AI agents can actually cooperate with humans, not just optimize solo.

976 stars Jupyter Notebook AgentsLLMOps · Eval
overcooked_ai
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does Overcooked-AI is a Python environment that recreates the frantic cooperative cooking game as a testbed for human-AI coordination. Agents must jointly chop ingredients, cook soups, and deliver orders while navigating around each other in cramped kitchen layouts. The project includes a web demo where you can play alongside trained agents, plus collected datasets of human-human and human-AI gameplay.

The interesting bit The research angle is “zero-shot coordination” — training agents that cooperate well with unfamiliar human partners, not just teammates they were trained with. The environment exposes the tension between reward optimization and legible, predictable behavior that humans can actually mesh with.

Key highlights

  • Programmatic layout generation: kitchens can be hand-designed or procedurally generated
  • Built-in planning baselines: A* search and near-optimal planners for comparison
  • Web-based human-AI data collection via a Flask server with Docker deployment
  • PyPI package (pip install overcooked-ai) with lockfile-supported source builds via uv
  • Extensive adoption: used in NeurIPS, AAMAS, RSS, and AAAI papers since 2019

Caveats

  • RL and behavior cloning training code is deprecated; the maintainers removed PPO/BC implementations and are seeking contributors to rebuild them
  • Raw human gameplay data sits in a Google Drive, not the repo, because it exceeds 100 MB
  • The demo and data collection tools require separate setup in the overcooked_demo directory

Verdict Grab this if you’re researching human-AI coordination, ad hoc teamwork, or interpretable multi-agent behavior. Skip it if you want a plug-and-play RL training framework — the supported bits are the environment and planning tools, not the learning stack.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.