← all repositories
google-research/tapas

BERT learns to read spreadsheets

Google Research's TAPAS lets you ask natural-language questions of structured tables without generating SQL or logical forms.

1.2k stars Python Language ModelsRAG · Search
tapas
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does TAPAS is a transformer-based model that takes a table and a natural-language question, then predicts which cells to select or aggregate to produce an answer. It treats table QA as an end-to-end classification problem over cell coordinates and aggregation operators, skipping the traditional intermediate step of generating a query language like SQL.

The interesting bit The model encodes the table directly into the transformer by adding positional embeddings that track row and column indices, plus a binary mask for which cells are numeric. This lets BERT-like attention operate over flattened table tokens as if they were sentences, which is either elegant or horrifying depending on your feelings about spreadsheets.

Key highlights

  • Pre-trained on 6.2M table-text pairs from Wikipedia, then fine-tuned on WikiSQL, WTQ, SQA, and TabFact
  • Released in multiple sizes from TINY to LARGE, with and without per-cell position index resetting
  • Also supports table entailment (TabFact) and open-domain table retrieval via dense passage retrieval extensions
  • Available in Hugging Face transformers since v4.1.1 with 28 checkpoints and a live widget
  • Multiple Colab notebooks provided for trying predictions on GPU/TPU without local setup

Caveats

  • Requires protoc compiler installed before pip install due to protocol buffer dependencies
  • Self-reported accuracy metrics are medians over three runs using their own evaluation tool, not official task metrics
  • The WTQ dev accuracy tops out around 51% even for the largest model, suggesting tables remain genuinely hard

Verdict Worth exploring if you’re building natural-language interfaces to databases or documents with embedded tables. Skip it if you need guaranteed exact SQL generation or your tables are small enough that a traditional query builder suffices.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.