NIPS 2016: when GANs learned to hallucinate tiny beaches
A Torch7 implementation that generates short, plausible video clips by separating foreground motion from static backgrounds using adversarial training.

What it does This is the original implementation of a 2016 NIPS paper that trains generative adversarial networks to produce tiny videos—think 32-frame clips of beaches, golf swings, or train stations. The model learns scene dynamics from stabilized video data and outputs GIF-worthy hallucinations that are not photo-realistic but move in roughly the right ways for their category.
The interesting bit The generator decomposes each frame into a learned background image plus a moving foreground mask, a simple but effective inductive bias that keeps the model from having to reinvent static scenery every frame. The README also exposes these intermediate layers (mask, background) for manual inspection, which is rarer than it should be in generative codebases.
Key highlights
- Based on DCGAN and extended to 3D convolutions for spatiotemporal generation
- Conditional variant (
main_conditional.lua) takes a static image and animates it - Pre-trained models available as a 1 GB download
- Data pipeline assumes offline video stabilization, with helper code in the
extradirectory - Outputs intermediate layers for debugging the foreground/background decomposition
Caveats
- Requires Torch7, which is effectively a legacy framework at this point
- The authors note the results are “not photo-realistic”—manage your expectations
- Data prep is involved: videos must be stabilized, flattened into vertically concatenated JPEGs, and listed in a text file before training
Verdict Worth a look for researchers studying the archaeology of video generation or anyone who needs a concrete, interpretable baseline for foreground/background disentanglement. Skip it if you want something that runs in modern PyTorch out of the box.