← all repositories

snorkel-team/snorkel

A system for programmatically generating and managing training data for machine learning using weak supervision techniques.

6k stars Python Data Tooling
snorkel
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

Snorkel provides a framework for building and managing training datasets programmatically rather than through manual labeling. It leverages weak supervision sources such as heuristic rules, external knowledge bases, and probabilistic modeling to generate large-scale training data efficiently. The system originated from Stanford research on training data as the bottleneck in ML pipelines and has been deployed at organizations including Google.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.