Renumics/awesome-open-data-centric-ai
A curated collection of open source tools for building data-centric AI workflows on unstructured data.

This repository aggregates open source tooling for data-centric AI, a development paradigm focused on systematically engineering training data. The list covers tools for data curation, versioning, drift detection, active learning, synthetic data generation, noisy label handling, and uncertainty estimation on unstructured data types including images, audio, video, and text. It serves as a discovery resource for practitioners building ML workflows that leverage trained model information to iteratively improve datasets.