← all repositories

cleanlab/cleanvision

A Python library that automatically audits image datasets to find quality issues like duplicates, blur, and exposure problems.

1.2k stars Python Data ToolingComputer Vision
cleanvision
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

CleanVision is a data-centric AI package for automatically detecting potential issues in image datasets before applying machine learning. It checks for predefined issues including blurry images, under/over-exposed images, near-duplicates, and other visual quality problems. Users simply point it at a folder of images and run a few lines of Python to generate a report of dataset issues that should be addressed prior to model training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.