← all repositories
google-research/meta-dataset

A benchmark that makes few-shot learning actually prove itself

Google Research built a meta-learning stress test from ten real-world datasets, because classifying 5 images of a dog isn't a career.

804 stars Jupyter Notebook Data ToolingLLMOps · Eval
meta-dataset
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Meta-Dataset is a benchmark and data pipeline for few-shot learning: training models to classify new categories from a handful of examples. It bundles ten diverse visual datasets (ImageNet, Omniglot, Aircraft, Birds, Textures, QuickDraw, Fungi, VGG Flower, Traffic Signs, MSCOCO) into a single evaluation framework with standardized “episodes” — sampled tasks where you get, say, 5 examples of 5 new classes and must classify test images.

The repository includes the full data conversion pipeline, training scripts, and reference implementations for several meta-learning baselines (MAML, Prototypical Networks, Matching Networks) plus two follow-up methods: CrossTransformers (spatially-aware Transformer, SOTA on ImageNet-only training as of NeurIPS 2020) and FLUTE (a “universal template” approach with FiLM parameters, SOTA on train-on-all as of ICML 2021).

The interesting bit

Most few-shot benchmarks use a single dataset with held-out classes. Meta-Dataset forces models to generalize across datasets — a model trained on natural images must handle sketches, textures, or traffic signs. The leaderboard reveals this is hard: even strong methods collapse on out-of-distribution datasets. The project also tracks a subtle bug (#54) where Traffic Sign evaluation needed shuffled samples, suggesting the maintainers actually care about measurement integrity.

Key highlights

  • TFDS-based input pipeline released for both original (MD-v1) and updated VTAB+MD (MD-v2) protocols
  • Pre-trained checkpoints available for CrossTransformers (three variants) and FLUTE
  • Leaderboard with confidence intervals and per-dataset breakdowns, not just aggregate scores
  • Includes an introductory Jupyter notebook demonstrating episode sampling
  • Code and configs preserved for arXiv v1; v2 reproduction in active development on arxiv_v2_dev branch

Caveats

  • Not an officially supported Google product; maintenance appears research-driven
  • Instructions for reproducing arXiv v2 results are still in progress
  • Heavy TensorFlow/TFDS dependency; PyTorch users are on their own for porting

Verdict

Worth your time if you’re doing meta-learning research and need a rigorous benchmark that punishes dataset overfitting. Skip it if you want plug-and-play few-shot learning for a product — this is a measurement tool, not a library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.