Is meta-dataset open source?

Yes — google-research/meta-dataset is open source, released under the Apache-2.0 license.

What language is meta-dataset written in?

google-research/meta-dataset is primarily written in Jupyter Notebook.

How popular is meta-dataset?

google-research/meta-dataset has 804 stars on GitHub.

Where can I find meta-dataset?

google-research/meta-dataset is on GitHub at https://github.com/google-research/meta-dataset.

← all repositories

google-research/meta-dataset

A benchmark that makes few-shot learning actually prove itself

Google Research built a meta-learning stress test from ten real-world datasets, because classifying 5 images of a dog isn't a career.

★804 stars Jupyter Notebook Data Tooling LLMOps · Eval

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Meta-Dataset is a benchmark and data pipeline for few-shot learning: training models to classify new categories from a handful of examples. It bundles ten diverse visual datasets (ImageNet, Omniglot, Aircraft, Birds, Textures, QuickDraw, Fungi, VGG Flower, Traffic Signs, MSCOCO) into a single evaluation framework with standardized “episodes” — sampled tasks where you get, say, 5 examples of 5 new classes and must classify test images.

The repository includes the full data conversion pipeline, training scripts, and reference implementations for several meta-learning baselines (MAML, Prototypical Networks, Matching Networks) plus two follow-up methods: CrossTransformers (spatially-aware Transformer, SOTA on ImageNet-only training as of NeurIPS 2020) and FLUTE (a “universal template” approach with FiLM parameters, SOTA on train-on-all as of ICML 2021).

The interesting bit

Most few-shot benchmarks use a single dataset with held-out classes. Meta-Dataset forces models to generalize across datasets — a model trained on natural images must handle sketches, textures, or traffic signs. The leaderboard reveals this is hard: even strong methods collapse on out-of-distribution datasets. The project also tracks a subtle bug (#54) where Traffic Sign evaluation needed shuffled samples, suggesting the maintainers actually care about measurement integrity.

Key highlights

TFDS-based input pipeline released for both original (MD-v1) and updated VTAB+MD (MD-v2) protocols
Pre-trained checkpoints available for CrossTransformers (three variants) and FLUTE
Leaderboard with confidence intervals and per-dataset breakdowns, not just aggregate scores
Includes an introductory Jupyter notebook demonstrating episode sampling
Code and configs preserved for arXiv v1; v2 reproduction in active development on arxiv_v2_dev branch

Caveats

Not an officially supported Google product; maintenance appears research-driven
Instructions for reproducing arXiv v2 results are still in progress
Heavy TensorFlow/TFDS dependency; PyTorch users are on their own for porting

Verdict

Worth your time if you’re doing meta-learning research and need a rigorous benchmark that punishes dataset overfitting. Skip it if you want plug-and-play few-shot learning for a product — this is a measurement tool, not a library.

Frequently asked

What is google-research/meta-dataset?: Google Research built a meta-learning stress test from ten real-world datasets, because classifying 5 images of a dog isn't a career.
Is meta-dataset open source?: Yes — google-research/meta-dataset is open source, released under the Apache-2.0 license.
What language is meta-dataset written in?: google-research/meta-dataset is primarily written in Jupyter Notebook.
How popular is meta-dataset?: google-research/meta-dataset has 804 stars on GitHub.
Where can I find meta-dataset?: google-research/meta-dataset is on GitHub at https://github.com/google-research/meta-dataset.