Is actionformer_release open source?

Yes — happyharrycn/actionformer_release is open source, released under the MIT license.

What language is actionformer_release written in?

happyharrycn/actionformer_release is primarily written in Python.

How popular is actionformer_release?

happyharrycn/actionformer_release has 571 stars on GitHub.

Where can I find actionformer_release?

happyharrycn/actionformer_release is on GitHub at https://github.com/happyharrycn/actionformer_release.

← all repositories

happyharrycn/actionformer_release

A Transformer That Actually Watches the Clock

Because temporal action localization was still chained to anchor windows and proposals, ActionFormer uses local self-attention to find action boundaries directly.

★571 stars Python Computer Vision

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ActionFormer is a Transformer-based model for temporal action localization: it finds the start and end times of action instances in untrimmed videos and labels their categories. It processes pre-extracted video features—such as I3D or TSP embeddings—using local self-attention to capture temporal context, then classifies each moment and regresses its boundaries in a single forward pass. The design drops the usual action proposals and pre-defined anchor windows entirely.

The interesting bit

Instead of chaining together action proposals and pre-defined anchor windows, the model treats temporal localization as a direct classification and regression problem using standard losses. The payoff is a large margin on established benchmarks: 71.0% mAP at tIoU=0.5 on THUMOS14, which the authors report as 14.1 percentage points above the prior best and the first result to cross 60% on that dataset.

Key highlights

Hits 71.0% mAP at tIoU=0.5 on THUMOS14, 36.56% average mAP on ActivityNet 1.3, and +13.5% average mAP on EPIC-Kitchens 100 over prior works
Proposal-free, single-shot design trained with standard classification and regression losses
Backed multiple winning solutions in the Ego4D Moment Queries Challenge 2022; the authors’ own entry placed second with 21.76% average mAP
Training requires roughly 4.5 GB of GPU memory, though inference demands more than 10 GB
Code structure follows Detectron2 conventions

Caveats

Inference memory requirements (over 10 GB) outpace training (~4.5 GB), so a 12 GB GPU is effectively the floor
Expects pre-extracted features, not raw video; you will need an upstream I3D or TSP feature pipeline

Verdict

A solid starting point for researchers in temporal action localization who want a strong, proposal-free baseline with published checkpoints. Skip it if you need an end-to-end system that consumes raw video frames.

Frequently asked

What is happyharrycn/actionformer_release?: Because temporal action localization was still chained to anchor windows and proposals, ActionFormer uses local self-attention to find action boundaries directly.
Is actionformer_release open source?: Yes — happyharrycn/actionformer_release is open source, released under the MIT license.
What language is actionformer_release written in?: happyharrycn/actionformer_release is primarily written in Python.
How popular is actionformer_release?: happyharrycn/actionformer_release has 571 stars on GitHub.
Where can I find actionformer_release?: happyharrycn/actionformer_release is on GitHub at https://github.com/happyharrycn/actionformer_release.