← all repositories
midas-research/audino

An audio labeler that isn't CVAT, but is built on it

Audino wraps CVAT's backend with a React frontend to make speech annotation slightly less painful.

1.1k stars TypeScript Data Tooling
audino
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does Audino v2.0 is a browser-based tool for annotating audio — think transcription, speaker diarization, voice activity detection, and emotion tagging. It packages a React frontend around a CVAT-derived backend, deploys via Docker Compose, and exports data in formats meant to play nice with downstream ML pipelines.

The interesting bit The project is essentially a specialized skin over CVAT’s infrastructure. The README’s development guide has you installing CVAT dependencies, running cvat_server, and browsing CVAT docs — which is either pragmatic reuse or an admission that building annotation UIs from scratch is a slog. The emoji support in labels is a small but humanizing touch in an otherwise utilitarian space.

Key highlights

  • Docker-first deployment; local dev requires Ubuntu 22.04/20.04, Python 3.10+, Node 20, and patience
  • User-level project/task/job hierarchy with role-based access (superuser setup required out of the box)
  • Multi-language and emoji-capable labels
  • Sponsored by Human Protocol, which uses it as an annotation service layer
  • CC BY-NC 4.0 license — commercial use needs a conversation

Caveats

  • v2.0 is “actively under development” and the migration from the original Audino is incomplete
  • New users register with zero permissions by default; admin intervention is required before anyone can view tasks
  • The README’s feature list is vague on which export formats are actually supported

Verdict Worth a look if you need a self-hosted audio annotation layer and already tolerate CVAT’s complexity. Skip it if you want something that works out of the box for non-technical annotators, or if the non-commercial license is a dealbreaker.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.