← all repositories
dwzhu-pku/PaperBanana

An AI art department for scientists who can't draw

PaperBanana orchestrates five specialized agents to turn your method section into publication-ready diagrams—because not every researcher wants to battle PowerPoint at 2 AM.

6.5k stars Python Creative · DesignDomain Apps
PaperBanana
Velocity · 7d
+50
★ / day
Trend
steady
star history

What it does PaperBanana takes raw scientific text—your method section and figure caption—and generates candidate diagrams or plots through a pipeline of specialized agents. You paste content, pick a pipeline mode, and get back up to 20 visual candidates you can refine or upscale to 2K/4K. It runs as a Gradio app, Streamlit demo, or CLI tool, with a live Hugging Face Spaces version that needs only an API key.

The interesting bit The architecture mirrors an actual creative team: a Retriever finds similar published diagrams for inspiration, a Planner writes detailed visual descriptions, a Stylist enforces academic aesthetic standards, a Visualizer calls image-generation APIs, and a Critic loops back for iterative refinement. The authors explicitly note this is a fork of Google’s PaperVizAgent, repositioned as a fully open-source community project with no commercial intent.

Key highlights

  • Five-agent pipeline with optional ablation: run vanilla (direct generation), partial pipelines, or the full Retriever → Planner → Stylist → Visualizer → Critic stack
  • Supports both conceptual diagrams and statistical plots (though plot-generation code is still pending per the TODO list)
  • OpenRouter integration routes to OpenAI, Anthropic, and other providers; Google Gemini also supported
  • Graceful degradation: works without the reference dataset by bypassing the Retriever’s few-shot learning
  • Built-in evaluation framework and pipeline visualization tools for inspecting intermediate agent outputs

Caveats

  • Several promised features remain on the TODO list: manual example selection UI, statistical plot generation code, and diagram improvement from style guidelines
  • Reference set is currently computer-science-centric; expansion to other fields is planned but not yet delivered
  • High-concurrency generation requires an API key that supports it—cost scaling is left as an exercise for the reader

Verdict Worth a spin for ML/AI researchers who regularly need conceptual figures and would rather iterate on prompts than pixel-push in Illustrator. Skip it if you need production-grade reliability today, work outside CS, or balk at API costs for batch generation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.