← all repositories
HumanSignal/label-studio

The Swiss Army knife of data labeling, now with 27k GitHub stars

Label Studio is an open-source tool for labeling images, text, audio, video, and time series, then exporting to model formats.

27.5k stars TypeScript Data Tooling
label-studio
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does

Label Studio is a web-based data labeling tool that handles images, text, audio, video, HTML, and time series through a browser UI. Multiple users can work on multiple projects, with annotations tied to accounts. Data imports from files (JSON, CSV, TSV, RAR, ZIP) or cloud storage (S3, GCS), and exports to various ML model formats. It runs locally via Docker, pip, or poetry, or deploys to Heroku, Azure, or GCP with one-click buttons.

The interesting bit

The project doesn’t just store annotations—it plugs into your training loop. You can connect an ML backend server to pre-label data, run online learning as new annotations arrive, or do active learning by labeling only the hardest examples. The frontend (React + mobx-state-tree) and backend are separable if you want to embed pieces in an existing pipeline.

Key highlights

  • Supports pre-labeling, online learning, and active learning via ML backend integration
  • Configurable label formats through a dedicated configuration language
  • REST API for pipeline embedding
  • Docker Compose stack includes Nginx and PostgreSQL for production use
  • Separate converter library to encode labels for “your favorite machine learning library”

Caveats

  • Windows requires manual wheel installs for lxml from Gohlke builds
  • The MinIO + Docker Compose setup needs hosts file entries if you lack a static IP
  • Postgres testing in Docker containers requires additional configuration beyond the defaults

Verdict

Worth a look if you’re building a data pipeline and need one labeling tool across multiple data types. Skip it if you just need quick image bounding boxes and don’t care about export formats or team workflows.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.