Is WikiSQL open source?

Yes — salesforce/WikiSQL is open source, released under the BSD-3-Clause license.

What language is WikiSQL written in?

salesforce/WikiSQL is primarily written in HTML.

How popular is WikiSQL?

salesforce/WikiSQL has 1.8k stars on GitHub.

Where can I find WikiSQL?

salesforce/WikiSQL is on GitHub at https://github.com/salesforce/WikiSQL.

← all repositories

salesforce/WikiSQL

The dataset that taught machines to talk to databases

A crowd-sourced benchmark for turning natural language into SQL, with a leaderboard that tracks how close we've come to replacing database admins with chatbots.

★1.8k stars HTML Data Tooling Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

WikiSQL is a large annotated dataset for training and evaluating natural-language-to-SQL systems. It pairs English questions with SQL queries against structured tables, plus evaluation scripts and a maintained leaderboard. The repo contains the data in JSONL and SQLite formats, along with the original Seq2SQL baseline from Salesforce’s 2017 paper.

The interesting bit

The leaderboard splits cleanly between two regimes: models trained with gold logical forms, and “weakly supervised” models that learn from question-answer pairs alone. The gap between them has narrowed over time—TAPEX hits 89.5% test execution accuracy without logical forms—but execution-guided decoding remains the dominant trick for squeezing out points.

Key highlights

80,654 hand-annotated examples across train/dev/test splits
Evaluation strictly enforces no table-content peeking at inference time
Maintained leaderboard with results from Salesforce, Microsoft, Alibaba, Ant Group, and others
Original tokenizer frozen in amber: deprecated Stanza dependency, Docker image provided for reproducibility
Data ships as both line-delimited JSON and SQLite databases

Caveats

Python 3 only; Python 2 support explicitly punted to “welcome a pull request”
Tokenizer dependency on deprecated CoreNLP wrapper; authors won’t migrate to current Stanza to preserve reproducibility
README truncates before fully describing the data schema

Verdict

Essential if you’re building or benchmarking text-to-SQL models. Skip it if you need a production natural language interface—this is a research dataset, not a drop-in query engine.

Frequently asked

What is salesforce/WikiSQL?: A crowd-sourced benchmark for turning natural language into SQL, with a leaderboard that tracks how close we've come to replacing database admins with chatbots.
Is WikiSQL open source?: Yes — salesforce/WikiSQL is open source, released under the BSD-3-Clause license.
What language is WikiSQL written in?: salesforce/WikiSQL is primarily written in HTML.
How popular is WikiSQL?: salesforce/WikiSQL has 1.8k stars on GitHub.
Where can I find WikiSQL?: salesforce/WikiSQL is on GitHub at https://github.com/salesforce/WikiSQL.