← all repositories

mlcommons/croissant

A schema.org-based metadata format and Python library (mlcroissant) for describing, loading, and integrating machine learning datasets.

855 stars Jupyter Notebook Data Tooling
croissant
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

Croissant is a high-level JSON-LD format for machine learning datasets that layers metadata, resource descriptions, data structure, and default ML semantics into a unified schema. Developed by MLCommons, it builds on schema.org’s Dataset vocabulary to make datasets easier to find, use, and integrate with ML tools. The mlcroissant Python library provides functions to load datasets, inspect metadata, and use Croissant-formatted datasets in ML workflows with frameworks like TensorFlow Datasets.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.