Turning screenshots into actual diagrams you can edit
Edit Banana uses SAM 3 and multimodal LLMs to reverse-engineer static images into editable DrawIO XML.

What it does Upload a PNG or JPG of a flowchart, architecture diagram, or even a scientific formula. Edit Banana segments the image with a fine-tuned SAM 3, extracts text via local OCR (Tesseract) or PaddleOCR, converts formulas to LaTeX through Pix2Text, and stitches everything into a DrawIO-compatible XML file. The result is a diagram where every box, arrow, and label is independently selectable and editable—not a traced bitmap, but reconstructed vector logic.
The interesting bit The pipeline treats diagrams as structured content rather than flat images. SAM 3 handles element segmentation, while a “fixed multi-round VLM scanning” process guided by multimodal LLMs extracts relationships and layout. A crop-guided strategy sends high-resolution text regions to the formula engine, so LaTeX survives even in dense technical schematics.
Key highlights
- Outputs native DrawIO XML, not SVG approximations or raster traces
- Local-first OCR runs offline; optional PaddleOCR for mixed-language text
- Pix2Text integration for mathematical formula recognition and LaTeX conversion
- FastAPI backend with multi-user concurrency, global locks for GPU access, and LRU caching of image embeddings
- CLI batch processing plus a hosted web demo at editbanana.net
Caveats
- The GitHub repository explicitly trails behind the web service; latest features live online
- Setup is involved: manual SAM3 weight downloads, Tesseract or PaddleOCR installation, config file editing, and CUDA recommended
- The README warns of GPU architecture mismatches and specific paddlepaddle version bugs (avoid 3.3.0)
Verdict Worth exploring if you regularly need to modify legacy diagrams trapped in PDFs or screenshots, and you can tolerate setup friction. If you just need occasional conversions, the hosted demo is the pragmatic entry point; skip local installation unless you need batch processing or air-gapped use.