California: Google AI and Peking University researchers unveiled PaperBanana Thursday, a multi-agent framework automating academic diagram generation for research papers.
The system addresses a critical bottleneck in scientific publishing. While AI scientists handle literature reviews and code, visualizing complex discoveries remains labor-intensive.
PaperBanana orchestrates five specialized agents across two phases. The Linear Planning Phase deploys Retriever, Planner, and Stylist agents. The Iterative Refinement Phase uses Visualizer and Critic agents across three improvement rounds.
The Retriever Agent identifies 10 relevant reference examples from databases. The Planner Agent translates technical methodology text into detailed figure descriptions. The Stylist Agent ensures outputs match conference aesthetics like the “NeurIPS Look”.
The Visualizer Agent generates visuals using Nano-Banana-Pro for diagrams. For statistical plots, it writes executable Python Matplotlib code. The Critic Agent inspects images against source text, identifying factual errors or visual glitches.
Researchers introduced PaperBananaBench, a dataset of 292 test cases from NeurIPS 2025 publications. PaperBanana outperformed baselines by 17% overall score, 37.2% conciseness improvement, 12.9% readability gains, and 6.6% aesthetics enhancement.
The system excels in Agent and Reasoning diagrams, achieving 69.9% overall scores. For statistical plots, code-based generation ensures 100% data fidelity versus image models prone to numerical hallucinations.
Domain-specific aesthetic preferences vary significantly. Agent and Reasoning papers favor illustrative 2D vector robots and chat bubbles. Computer Vision research uses camera cones and point clouds. Generative Learning employs 3D cuboids for tensors. Theory papers maintain minimalist grayscale palettes.
The framework is available on GitHub with full documentation.
