Fixing blurry faces with a learned dictionary of face parts
A NeurIPS 2022 face restoration model that treats facial features as entries in a discrete codebook, making it harder to hallucinate implausible noses.

What it does CodeFormer restores degraded faces in photos and videos—low resolution, blur, compression artifacts, even color fading. It also handles face inpainting and colorization for cropped, aligned 512×512 face images. The model can process whole images (detecting and enhancing faces automatically) or video clips via ffmpeg, with optional background upscaling through Real-ESRGAN.
The interesting bit The “codebook lookup” part is the hook: instead of generating facial features from scratch, the model searches a learned dictionary of quantized face codes, then refines the match with a transformer. This constrains the restoration to plausible facial geometry—useful when you don’t want the model to invent a new chin because the input was too mangled.
Key highlights
- Adjustable fidelity-vs-quality tradeoff via a single weight parameter
win [0, 1] - Supports whole-image enhancement, cropped face restoration, video processing, colorization, and inpainting
- Training code and configs released April 2023; pre-trained models downloadable via script
- Integrated into Stable Diffusion WebUI, ComfyUI, ChaiNNer, and numerous third-party APIs
- Official demos on Hugging Face, Replicate, and OpenXLab; authors explicitly warn against unofficial sites charging for access
Caveats
- Whole-image mode uses face-background fusion that can damage hair texture at boundaries; the authors note this for fair academic comparison
- Background enhancement was planned but crossed out in the TODO list—Real-ESRGAN is the workaround
- Custom NTU S-Lab License 1.0, not a standard open-source license; redistribution has specific terms
Verdict Worth a look if you’re building photo restoration pipelines, working with archival video, or need a drop-in face enhancer for a diffusion workflow. Skip if you need general image restoration without faces—this model is specialized and the architecture overhead is only justified when facial fidelity matters.