Teaching neural networks to Photoshop themselves
A 2016 CVPR paper that learned unsupervised visual features by training GANs to fill in erased chunks of images.

What it does
Context Encoders train a neural network to reconstruct missing regions of an image—either a fixed center patch or random holes—using a combination of pixel-level reconstruction loss and adversarial training. The twist: this inpainting task forces the network to learn meaningful visual features without any labeled data.
The interesting bit
The project treats image completion as a feature learning problem, not just a graphics trick. By squeezing context through a bottleneck and demanding plausible reconstruction, the encoder picks up semantic representations that transfer to other vision tasks. It’s essentially making the network play a constrained game of visual Mad Libs.
Key highlights
- Joint training with reconstruction + adversarial loss; the adversarial part keeps outputs from getting blurry
- Supports both center-region and arbitrary random-region inpainting
- Pre-trained models available for Paris Street-View and ImageNet
- Includes Caffe feature extraction models for downstream use
- Forked from Soumith Chintala’s DCGAN.torch implementation (this is 2016-era Torch/Lua, not PyTorch)
Caveats
- Requires the now-legacy Torch framework; you’ll need
thand Lua dependencies - The TensorFlow reimplementation linked in the README is noted to lack “full functionalities”
- Paris Street-View dataset requires emailing the author for a private link due to Google Street View policy restrictions
Verdict
Worth studying if you’re tracing the lineage of self-supervised learning or modern inpainting (LaMa, Stable Diffusion infill, etc.). Skip if you need production-ready code; this is a research artifact from the GAN boom, not a maintained library.