Prodigy's cookbook: annotation recipes you can actually read
A public repo of commented, tweakable scripts for Explosion's commercial annotation tool.
What it does
This repository holds Python recipe scripts for Prodigy, Explosion’s scriptable data-annotation tool. Each recipe defines a data stream, an annotation interface, and optional model-in-the-loop logic for tasks like named-entity recognition, text classification, image bounding boxes, and terminology bootstrapping. You run them by passing the script path to Prodigy’s -F flag.
The interesting bit
These aren’t the built-in recipes shipped with Prodigy—they’re deliberately edited to include comments, simplified logic, and explicit extension points. The README nudges you to swap prefer_uncertain() for prefer_high_scores(), write custom sorting generators, or inject filters (e.g., only ask about two-word entities). It’s documentation disguised as code.
Key highlights
- NER recipes cover active learning (
ner.teach), pattern matching (ner.match), manual span annotation (ner.manual), fuzzy pre-highlighting viaspaczz, BERT word-piece tokenization, silver-to-gold correction, and A/B model evaluation. - Text classification spans manual, correction, teach, and a plug-in-your-own-model template using a random dummy as stand-in.
- Image recipes include manual polygon/box drawing, captioning with PyTorch pre-population, and three variants of TensorFlow Object Detection API integration (frozen model, TF Serving, and trainable loop).
- Miscellaneous: multi-choice annotation, custom HTML question-answering, and community contributions including record linkage with
dedupe. - Requires a paid Prodigy license; support goes through the Prodigy forum, not GitHub issues.
Caveats
- The README warns these recipes are not identical to built-ins; simplifications may trade robustness for readability.
- Several image recipes depend on specific TensorFlow ecosystem versions (Object Detection API, TF Serving) that are notorious for dependency fragility; no version pinning is visible in the README.
Verdict
Worth a skim if you already own a Prodigy license and want to customize annotation workflows beyond the defaults. If you’re annotation-curious but haven’t bought in, this repo won’t run standalone and serves mainly as a feature preview.