rom1504/img2dataset
A Python tool that downloads, resizes, and packages image URLs into datasets for ML training.

Velocity · 7d
+2.5
★ / day
Trend
→steady
star history
img2dataset enables efficient large-scale image dataset creation by downloading images from URLs and packaging them into usable formats. It supports caption pairing for multimodal training data and can process 100M URLs in 20 hours on a single machine. The tool is specifically positioned as infrastructure for deep learning and multimodal model training.