The dataset that taught machines to talk to databases
A crowd-sourced benchmark for turning natural language into SQL, with a leaderboard that tracks how close we've come to replacing database admins with chatbots.

What it does
WikiSQL is a large annotated dataset for training and evaluating natural-language-to-SQL systems. It pairs English questions with SQL queries against structured tables, plus evaluation scripts and a maintained leaderboard. The repo contains the data in JSONL and SQLite formats, along with the original Seq2SQL baseline from Salesforce’s 2017 paper.
The interesting bit
The leaderboard splits cleanly between two regimes: models trained with gold logical forms, and “weakly supervised” models that learn from question-answer pairs alone. The gap between them has narrowed over time—TAPEX hits 89.5% test execution accuracy without logical forms—but execution-guided decoding remains the dominant trick for squeezing out points.
Key highlights
- 80,654 hand-annotated examples across train/dev/test splits
- Evaluation strictly enforces no table-content peeking at inference time
- Maintained leaderboard with results from Salesforce, Microsoft, Alibaba, Ant Group, and others
- Original tokenizer frozen in amber: deprecated Stanza dependency, Docker image provided for reproducibility
- Data ships as both line-delimited JSON and SQLite databases
Caveats
- Python 3 only; Python 2 support explicitly punted to “welcome a pull request”
- Tokenizer dependency on deprecated CoreNLP wrapper; authors won’t migrate to current Stanza to preserve reproducibility
- README truncates before fully describing the data schema
Verdict
Essential if you’re building or benchmarking text-to-SQL models. Skip it if you need a production natural language interface—this is a research dataset, not a drop-in query engine.