← all repositories
datitran/raccoon_dataset

200 raccoons walk into a TensorFlow pipeline

A small, scrappy image dataset for anyone who wants to train an object detector without wrangling data formats first.

1.3k stars Jupyter Notebook Data ToolingComputer Vision
raccoon_dataset
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This repo packages 200 raccoon images scraped from Google and Pixabay into a ready-to-ingest bundle for TensorFlow’s Object Detection API. The author used it to train a raccoon detector and wrote about the process on Medium. You get PASCAL VOC annotations, TFRecord generators, CSV converters, and a pre-baked train/validation split (160/40).

The interesting bit

The value isn’t the raccoons—it’s the plumbing. Most object-detection tutorials drown you in format-shuffling; this repo hands you working scripts (generate_tfrecord.py, xml_to_csv.py) and a folder layout that the TF API expects out of the box. It’s a minimal viable dataset, not a benchmark.

Key highlights

  • 200 annotated images with bounding boxes in PASCAL VOC XML format
  • Ready-made 80/20 train-validation split
  • Includes Jupyter notebooks for visualizing boxes and re-splitting labels
  • Scripts to convert XML → CSV → TFRecord without leaving the repo
  • Frozen model and pipeline config from the author’s training run included

Caveats

  • Image provenance is “Google and Pixabay”—licensing and quality are uneven and not individually documented
  • 200 images is tiny by modern standards; expect overfitting without heavy augmentation or pre-trained weights

Verdict

Grab this if you’re learning the TF Object Detection API and want a working end-to-end example that isn’t COCO. Skip it if you need a production-ready detector or care about clean licensing provenance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.