Your napkin sketch, now with more HTML than you expected
A Keras model that treats wireframe drawings as images to caption—with markup as the caption language.

What it does
SketchCode takes hand-drawn website mockups and generates working HTML from them. Feed it a PNG of your wireframe, and it spits out markup via an image-captioning pipeline built in Keras. The project ships with pretrained weights and a synthetic dataset of 1,700 images, so you can run inference without training from scratch.
The interesting bit
The twist is architectural: instead of treating layout generation as a vision problem or a structured prediction problem, it frames it as language generation—predicting HTML tokens sequentially, conditioned on a visual encoder. It’s the same trick that produces “a dog playing frisbee” from a photo, except the vocabulary here is <div> and class="header".
Key highlights
- Pretrained model and 342MB dataset available via shell scripts (
get_data.sh,get_pretrained_model.sh) - Batch or single-image conversion via CLI
- Training supports Keras
ImageDataGeneratoraugmentation - Evaluation uses BLEU score against ground-truth GUI files
- Builds directly on prior work: Beltramelli’s
pix2codearchitecture and Wallner’s screenshot-to-code dataset
Caveats
- Explicitly a proof-of-concept; the README warns it doesn’t generalize to the “variability of sketches seen in actual wireframes”
- Pinned to TensorFlow 1.1.0 and Python 3—both increasingly archaeological
- Performance depends on wireframes resembling the core synthetic dataset
Verdict
Worth a look if you’re exploring neural approaches to structured output or teaching image-captioning concepts with a tangible, visual result. Skip it if you need production-grade design-to-code tooling; the authors are upfront that this isn’t it.