Let a model do the tedious clicking for you
Anno-Mage uses PyTorch detection models—including zero-shot OWL-v2—to suggest bounding boxes so you only correct the mistakes.

What it does Anno-Mage is a semi-automatic image annotation tool that runs detection models in the background and proposes bounding boxes for your custom labels. You review, tweak, and confirm. It ships as both a pip-installable desktop app and a FastAPI + React web stack. Output is plain CSV or Pascal VOC XML—nothing proprietary.
The interesting bit The zero-shot hook is the real labor-saver: feed it arbitrary text labels via OWL-v2 and the model will attempt to find them without ever seeing training examples. For known categories, it falls back to standard PyTorch RetinaNet. The same backend drives both the web UI and the PyPI package, so you’re not maintaining two divergent tools.
Key highlights
- OWL-v2 open-vocabulary detection: describe objects in plain text, no retraining
- Dual interface:
pip install anno-magefor local use, or run the FastAPI/React web app - Annotations land as CSV or Pascal VOC XML in
~/.anno-mage/annotations/ - CI/CD pipeline auto-publishes to PyPI on version tags via Trusted Publishers (no API tokens)
- Demo GIF shows the actual interaction loop: model proposes, human adjusts
Caveats
- The README notes the web app exists but pushes you to
web/README.mdfor details; the main docs are thin on model performance or hardware requirements - Zero-shot detection is convenient but historically prone to false positives; the project doesn’t quantify accuracy or correction rates
Verdict Worth a spin if you’re labeling hundreds of images and your categories are either COCO-standard or describable in a short phrase. Skip it if you need pixel-perfect segmentation or rigorous QA workflows—this is bounding-box-only territory.