Paint by graph: generating images from structured relationships
A 2018 CVPR paper that turns scene graphs—nodes for objects, edges for relationships—into actual images you can manipulate by editing the graph.

What it does
sg2im takes a scene graph (think: “sheep left of sheep riding sheep near sheep”) and generates a corresponding image. You write JSON describing objects and their relationships; the model outputs pixels. The README shows sheep arrangements that mutate as the graph changes—useful for controlled image synthesis where you want fine-grained control over composition without hand-drawing.
The interesting bit
The pipeline is deliberately staged: a graph convolution network first computes object embeddings, then predicts bounding boxes and masks to build a coarse “scene layout,” and only then does a cascaded refinement network upscale to the final image. It’s image generation via intermediate structure rather than direct text-to-pixels—more architect’s blueprint than painter’s impulse.
Key highlights
- Pretrained models included: COCO-Stuff and Visual Genome at 64×64, plus Visual Genome at 128×128 (~355 MB download)
- Ablation study models available (12 more, ~1.25 GB) for dissecting which components matter
- Simple JSON input format;
scripts/run_model.pyhandles CPU/GPU and optional GraphViz rendering of the graph itself - Full training instructions in separate TRAINING.md
- Reproducibility: provided JSONs and checkpoints recreate all Figures 5 and 6 from the paper
Caveats
- Frozen in 2018: Python 3.5, PyTorch 0.4, Ubuntu 16.04—expect dependency archaeology
- Not an officially supported Google product (stated explicitly)
- Output resolution tops out at 128×128; this is research code, not a production renderer
Verdict
Worth a look if you’re studying structured generation, layout-conditioned image synthesis, or the history of controllable diffusion precursors. Skip it if you need modern resolution, maintained dependencies, or a plug-and-play API.