← all repositories
explosion/prodigy-recipes

Prodigy's cookbook: annotation recipes you can actually read

A public repo of commented, tweakable scripts for Explosion's commercial annotation tool.

507 stars Jupyter Notebook Data Tooling
prodigy-recipes
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repository holds Python recipe scripts for Prodigy, Explosion’s scriptable data-annotation tool. Each recipe defines a data stream, an annotation interface, and optional model-in-the-loop logic for tasks like named-entity recognition, text classification, image bounding boxes, and terminology bootstrapping. You run them by passing the script path to Prodigy’s -F flag.

The interesting bit

These aren’t the built-in recipes shipped with Prodigy—they’re deliberately edited to include comments, simplified logic, and explicit extension points. The README nudges you to swap prefer_uncertain() for prefer_high_scores(), write custom sorting generators, or inject filters (e.g., only ask about two-word entities). It’s documentation disguised as code.

Key highlights

  • NER recipes cover active learning (ner.teach), pattern matching (ner.match), manual span annotation (ner.manual), fuzzy pre-highlighting via spaczz, BERT word-piece tokenization, silver-to-gold correction, and A/B model evaluation.
  • Text classification spans manual, correction, teach, and a plug-in-your-own-model template using a random dummy as stand-in.
  • Image recipes include manual polygon/box drawing, captioning with PyTorch pre-population, and three variants of TensorFlow Object Detection API integration (frozen model, TF Serving, and trainable loop).
  • Miscellaneous: multi-choice annotation, custom HTML question-answering, and community contributions including record linkage with dedupe.
  • Requires a paid Prodigy license; support goes through the Prodigy forum, not GitHub issues.

Caveats

  • The README warns these recipes are not identical to built-ins; simplifications may trade robustness for readability.
  • Several image recipes depend on specific TensorFlow ecosystem versions (Object Detection API, TF Serving) that are notorious for dependency fragility; no version pinning is visible in the README.

Verdict

Worth a skim if you already own a Prodigy license and want to customize annotation workflows beyond the defaults. If you’re annotation-curious but haven’t bought in, this repo won’t run standalone and serves mainly as a feature preview.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.