An AI art department for scientists who can't draw
PaperBanana orchestrates five specialized agents to turn your method section into publication-ready diagrams—because not every researcher wants to battle PowerPoint at 2 AM.

What it does PaperBanana takes raw scientific text—your method section and figure caption—and generates candidate diagrams or plots through a pipeline of specialized agents. You paste content, pick a pipeline mode, and get back up to 20 visual candidates you can refine or upscale to 2K/4K. It runs as a Gradio app, Streamlit demo, or CLI tool, with a live Hugging Face Spaces version that needs only an API key.
The interesting bit The architecture mirrors an actual creative team: a Retriever finds similar published diagrams for inspiration, a Planner writes detailed visual descriptions, a Stylist enforces academic aesthetic standards, a Visualizer calls image-generation APIs, and a Critic loops back for iterative refinement. The authors explicitly note this is a fork of Google’s PaperVizAgent, repositioned as a fully open-source community project with no commercial intent.
Key highlights
- Five-agent pipeline with optional ablation: run vanilla (direct generation), partial pipelines, or the full Retriever → Planner → Stylist → Visualizer → Critic stack
- Supports both conceptual diagrams and statistical plots (though plot-generation code is still pending per the TODO list)
- OpenRouter integration routes to OpenAI, Anthropic, and other providers; Google Gemini also supported
- Graceful degradation: works without the reference dataset by bypassing the Retriever’s few-shot learning
- Built-in evaluation framework and pipeline visualization tools for inspecting intermediate agent outputs
Caveats
- Several promised features remain on the TODO list: manual example selection UI, statistical plot generation code, and diagram improvement from style guidelines
- Reference set is currently computer-science-centric; expansion to other fields is planned but not yet delivered
- High-concurrency generation requires an API key that supports it—cost scaling is left as an exercise for the reader
Verdict Worth a spin for ML/AI researchers who regularly need conceptual figures and would rather iterate on prompts than pixel-push in Illustrator. Skip it if you need production-grade reliability today, work outside CS, or balk at API costs for batch generation.