Is multiwoz open source?

Yes — budzianowski/multiwoz is open source, released under the MIT license.

What language is multiwoz written in?

budzianowski/multiwoz is primarily written in Python.

How popular is multiwoz?

budzianowski/multiwoz has 954 stars on GitHub.

Where can I find multiwoz?

budzianowski/multiwoz is on GitHub at https://github.com/budzianowski/multiwoz.

← all repositories

budzianowski/multiwoz

The dataset that launched a thousand chatbots

MultiWOZ is the standard benchmark for task-oriented dialogue systems—10k conversations across hotels, restaurants, trains, and more, with annotated belief states that track what the user actually wants.

★954 stars Python Data Tooling Chat Assistants

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

MultiWOZ provides 10,000 human-human dialogues spanning multiple domains (hotel, restaurant, attraction, train, taxi, hospital, police). Each dialogue includes goals, user/system utterances, and belief states tracking slot values across turns. It’s designed to train and evaluate dialogue systems that must understand user intent, track state, and generate appropriate responses—typically split into train/test/dev sets with 1k examples each in validation and test.

The interesting bit

The dataset has been through three major corrections (2.0 → 2.1 by Amazon, 2.2 by Google), which tells you something about how messy real dialogue annotation is. The README openly notes that “the goal sometimes was wrongly followed by the turkers” and that some dialogues weren’t finished—rare honesty in benchmark documentation. The joint accuracy metric includes ALL slots, so there’s nowhere to hide partial understanding.

Key highlights

3,406 single-domain + 7,032 multi-domain dialogues (up to 5 domains)
Belief state structure: semi (domain slots), book (booking slots), booked (confirmed booking)
Hospital and police domains excluded from validation/test sets for fair comparison
System utterances only have manual dialogue-act annotations; user acts added heuristically in 2.1 via ConvLab
Benchmark tables track DST progress from 15.57% joint accuracy (MDBT, 2018) to 63.79% (TOATOD, 2023) on 2.2
Zero-shot LLM results now included (GPT-3.5, Codex) for comparison against fine-tuned models

Caveats

No 1-to-1 mapping between dialogue acts and sentences
MUL/PMUL vs SNG/SSNG/WOZ filename conventions are easy to mix up
Some evaluation scripts (like SimpleTOD’s) inflate scores by conflating dontcare and none

Verdict

Essential if you’re building or benchmarking task-oriented dialogue systems; skip if you’re doing open-domain chitchat or don’t want to wrestle with six-year-old data collection artifacts. The corrected 2.2 version is what you actually want to use.

Frequently asked

What is budzianowski/multiwoz?: MultiWOZ is the standard benchmark for task-oriented dialogue systems—10k conversations across hotels, restaurants, trains, and more, with annotated belief states that track what the user actually wants.
Is multiwoz open source?: Yes — budzianowski/multiwoz is open source, released under the MIT license.
What language is multiwoz written in?: budzianowski/multiwoz is primarily written in Python.
How popular is multiwoz?: budzianowski/multiwoz has 954 stars on GitHub.
Where can I find multiwoz?: budzianowski/multiwoz is on GitHub at https://github.com/budzianowski/multiwoz.