← all repositories
pedropro/TACO

A dataset for teaching AI to spot litter in the wild

TACO provides manually segmented images of trash on roads, beaches, and in woods to train object detection models that can find garbage where humans left it.

736 stars Jupyter Notebook Computer VisionData Tooling
TACO
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

TACO is a dataset and toolkit for training object detection models to identify litter in outdoor environments. It bundles images from Flickr with pixel-level segmentation annotations in COCO format, plus scripts to download data, split train/val/test sets, and run a modified Mask R-CNN implementation. The project also hosts a web tool at tacodataset.org for collecting more crowd-sourced annotations.

The interesting bit

The taxonomy is hierarchical and the class distribution is heavily imbalanced—most categories have very few examples—so the authors provide pre-built class maps that collapse rare trash types into dominant ones like cans, bottles, and plastic bags. You can also define your own groupings. The “unofficial” annotations submitted via the website are kept separate and unvetted, which is a refreshingly honest way to handle crowd data.

Key highlights

  • Annotations follow standard COCO format, so it plugs into existing detection pipelines with minimal friction
  • Includes a working Mask R-CNN detector fork in /detector with dataset splitting and config scripts
  • Images are hosted on Flickr, not in the repo itself; download.py fetches them on demand
  • Provides both official reviewed annotations and a separate annotations_unofficial.json for crowd submissions
  • Paper and citation info available at arXiv:2003.06975

Caveats

  • The dataset is “still relatively small” per the authors’ own admission
  • Most original classes have very few annotations, forcing you to merge or drop categories
  • Requires separate installation of the COCO Python API to run the demo notebook
  • Unofficial annotations are explicitly flagged as potentially inaccurate or poorly segmented

Verdict

Worth a look if you’re building litter-detection models for drones, beach-cleaning robots, or environmental monitoring. Skip it if you need a large, balanced dataset out of the box—this is a growing community effort, not a finished product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.