← all repositories
ruotianluo/self-critical.pytorch

A thousand stars for teaching neural nets to caption images

This repo implements self-critical reinforcement learning for image captioning, plus the kitchen sink of training tricks.

self-critical.pytorch
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

A PyTorch research codebase for training image captioning models on COCO or Flickr30k. It covers the full pipeline: data prep, cross-entropy pretraining, self-critical RL fine-tuning, and evaluation with standard metrics (BLEU, METEOR, CIDEr). There’s also a simple HTML visualizer for browsing results in a browser.

The interesting bit

The self-critical sequence training is the headline feature: after 30 epochs of standard training, switching to RL with CIDEr as reward reportedly pushes scores to ~1.05. The author also quietly added DistributedDataParallel via pytorch-lightning and a Transformer captioning model, making this more of a living toolkit than a one-paper reproduction.

Key highlights

  • Supports self-critical RL, bottom-up attention features, test-time ensembling, and Transformer architectures
  • YAML configs plus command-line overrides for training; TensorBoard logging built in
  • Pretrained models available; evaluation works on raw image folders or standard splits
  • Colab demo notebook provided for quick experimentation
  • Can install as editable pip package if the raw scripts misbehave

Caveats

  • No CPU support at all; author notes “there’s no point using cpus to train” and CPU inference needs a custom request
  • Raw-image evaluation explicitly doesn’t work for bottom-up feature models
  • Live demo not implemented; “welcome pull request”

Verdict

Worth a look if you’re doing image captioning research and want a battle-tested PyTorch base with RL training already wired up. Skip it if you need a production API or CPU inference — this is a training rig, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.