← all repositories
kenshohara/video-classification-3d-cnn-pytorch

Pretrained 3D ResNet: drop in a video, get action labels

A straightforward inference wrapper for spatiotemporal CNNs trained on 400 human actions.

1.1k stars Python Computer VisionML Frameworks
video-classification-3d-cnn-pytorch
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does Feed it a video folder and a pretrained 3D ResNet (ResNet-34 or ResNeXt-101) and it spits out JSON: either class scores across 400 Kinetics action categories, or 512-dim feature vectors, computed every 16 frames. There’s also a small visualization script to overlay predictions back onto the source video.

The interesting bit The project is essentially a clean inference harness around the author’s earlier training codebase. The value isn’t novelty—it’s convenience. You don’t retrain; you download weights, point at ~/videos, and run. The 2017 paper’s question—“Can 3D CNNs retrace 2D CNNs’ history?"—is answered here with a pragmatic “yes, and here’s the tool.”

Key highlights

  • Pretrained on Kinetics-400 (400 action classes)
  • Two modes: score (class predictions) or feature (512-dim embeddings post-global average pooling)
  • Supports ResNeXt-101, which the authors note performed best
  • Includes a result visualization script
  • Companion Lua/Torch version exists for the historically inclined

Caveats

  • Setup instructions reference PyTorch 0.x-era conda channels (soumith, cuda80) and FFmpeg 3.3.3; expect to adapt for modern environments
  • The README is sparse on input format specifics—resolution, codec compatibility, exact JSON schema are left unstated
  • No mention of GPU memory requirements or batching behavior for long videos

Verdict Useful if you need quick, off-the-shelf action recognition or video feature extraction without building a pipeline from scratch. Skip if you need fine-grained temporal modeling, custom classes, or production-grade robustness—this is research code with research-code edges.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.