snorkel-team/snorkel
A system for programmatically generating and managing training data for machine learning using weak supervision techniques.

Snorkel provides a framework for building and managing training datasets programmatically rather than through manual labeling. It leverages weak supervision sources such as heuristic rules, external knowledge bases, and probabilistic modeling to generate large-scale training data efficiently. The system originated from Stanford research on training data as the bottleneck in ML pipelines and has been deployed at organizations including Google.