mlcommons/croissant
A schema.org-based metadata format and Python library (mlcroissant) for describing, loading, and integrating machine learning datasets.

Croissant is a high-level JSON-LD format for machine learning datasets that layers metadata, resource descriptions, data structure, and default ML semantics into a unified schema. Developed by MLCommons, it builds on schema.org’s Dataset vocabulary to make datasets easier to find, use, and integrate with ML tools. The mlcroissant Python library provides functions to load datasets, inspect metadata, and use Croissant-formatted datasets in ML workflows with frameworks like TensorFlow Datasets.