A Swiss Army knife for AI-assisted image labeling
X-AnyLabeling wraps two dozen vision models into a single desktop app so you can stop drawing bounding boxes by hand.

What it does X-AnyLabeling is a desktop annotation tool (PyQt6, Python 3.11+) that runs pre-trained vision models locally to auto-label images and videos. You point it at a folder, pick a model, and it spits out polygons, boxes, masks, keypoints, or text annotations in standard formats like COCO, YOLO, or VOC. It also handles less common tasks: rotated boxes for aerial imagery, 3D cuboids, video tracking, document parsing, and even VQA-style captioning.
The interesting bit The breadth is the point. The built-in model zoo spans everything from YOLO variants and SAM 1/2/3 to vision-language models (Qwen3-VL, Gemini, ChatGPT via API), OCR pipelines, and pose estimators. Most run through ONNX Runtime or TensorRT locally, though some heavier VLM backends presumably need a GPU or remote endpoint. The project also ships a separate server component for remote inference if your workstation can’t handle the load.
Key highlights
- Supports 20+ task types: detection, segmentation, pose, depth, matting, tracking, OCR, grounding, counting, lane detection, and more.
- Imports/exports COCO, VOC, YOLO, DOTA, MOT, MASK, PPOCR, MMGD, VLM-R1, ShareGPT.
- Backends: ONNX Runtime, TensorRT, OpenCV DNN.
- UI localizations: English, Chinese, Japanese, Korean.
- Recent additions include SAM 3 text-grounded segmentation, video classifier panel with timeline editing, and PaddleOCR document parsing.
Caveats
- The README claims “industrial-grade” but doesn’t specify performance benchmarks or hardware requirements for the heavier models.
- PyQt6 migration is still marked Beta as of March 2026.
- License is GPL-3.0, so commercial derivatives must retain branding and source attribution.
Verdict Worth a look if you’re building computer-vision datasets and tired of context-switching between five different tools. Skip it if you need a headless, CI-friendly pipeline—this is a GUI-first application with a visual workflow.