HorizonWind2004/reconstruction-alignment
A self-supervised alignment method that enhances unified multimodal models' zero-shot performance across image generation and editing tasks.

The paper introduces Reconstruction Alignment (RecA), a self-supervised training technique that improves unified multimodal models by aligning visual and textual representations through a reconstruction objective. It has been validated on architectures including BAGEL, Janus, Show-o, and others, requiring only 6×80GB A100 GPUs for 4.5 hours. The method claims to boost BAGEL’s image editing capabilities beyond FLUX-Kontext while remaining fully open-source and reproducible.