Google’s AI Agents Now Target Figures and Peer Review

Two Bottlenecks, Two Agents

Academic publishing has two notorious chokepoints: creating publication-ready figures and surviving peer review. Google Research just shipped a dedicated AI agent for each. PaperVizAgent generates publication-quality scientific diagrams from text descriptions; ScholarPeer produces literature-grounded peer reviews. Neither is a chatbot wrapper. Both are orchestrated multi-agent pipelines targeting the parts of the research workflow that text generation alone cannot handle.

The timing matters. Submission volumes at top conferences have roughly doubled over five years, reviewer fatigue is acute, and most researchers still spend significant time iterating on figures manually. These aren’t peripheral problems — they’re the friction between a completed experiment and a published result.

PaperVizAgent: From Text to Publication-Ready Figures

Creating a methodology diagram or a statistical plot for a top-tier venue is harder than it sounds. The figure needs to match the paper’s notation, reflect the experimental setup precisely, and meet journal style requirements — all at once. Asking a general-purpose LLM to produce this reliably has not worked.

PaperVizAgent (also released as PaperBanana in open-source form on GitHub) breaks the task into five specialized agents: a Retriever pulls relevant examples from the literature; a Planner outlines the figure structure; a Stylist applies domain-appropriate visual conventions; a Visualizer renders the output; and a Critic evaluates and requests revisions. Each pass through the Critic loop refines the result until it meets quality thresholds.

How the Benchmark Reads

On PaperBananaBench — a human-evaluated benchmark that rates figures across five criteria including clarity, accuracy, and style adherence — PaperVizAgent scores 60.2, against a human baseline of 50.0. It outperforms all tested baselines, including GPT-Image-1.5 and Paper2Any. That said, the benchmark is human-evaluated and subjective; what the number captures is that domain experts consistently rate PaperVizAgent outputs as better than what comparable tools produce, not that the tool is infallible.

The codebase is open source. Researchers can run it locally or extend the pipeline for domain-specific figure types. That openness distinguishes it from most commercial AI research tools and lowers the barrier to institutional adoption.

ScholarPeer: Peer Review That Actually Reads the Literature

The peer review crisis is structural: the pool of qualified reviewers grows linearly while submissions grow exponentially. Automated reviewing tools have existed for years, but they have largely been proxies for surface-level text analysis — checking grammar, flagging vague claims, noting missing citations. ScholarPeer attempts something more ambitious.

Its architecture is built around a dual-stream process. One stream handles context acquisition: a sub-domain historian agent dynamically builds a narrative of the relevant literature by querying live web-scale sources, grounding the review in what has actually been published, not just what the model was trained on. The second stream handles active verification: a baseline scout agent acts as an adversarial auditor, specifically hunting for datasets, baselines, or prior results that the paper’s authors may have omitted. A multi-aspect Q&A engine then verifies the paper’s technical claims against that accumulated context.

The result is a review that reads more like a senior researcher’s critique than a checklist. Google reports that ScholarPeer delivers reviews rated as more critical and more literature-grounded than current state-of-the-art automated reviewers. Independent benchmarking by third parties has not yet been published, so those claims warrant some skepticism — but the architecture is meaningfully different from what came before.

PAT at NeurIPS: Conferences Are Moving First

What makes this wave of tools consequential isn’t the tools themselves — it’s institutional adoption. Google’s Paper Assistant Tool (PAT), a Gemini-powered pre-submission feedback system, has already been integrated into ICML and STOC, and NeurIPS announced in April that it will offer PAT to authors submitting to NeurIPS 2026.

Each eligible author receives one voucher to submit a single paper for AI feedback before the final deadline. The feedback is private to the authors, independent of the review process, and Google has committed that submissions will not be used to fine-tune its models. Papers and feedback are deleted within seven days of delivery.

The privacy architecture is deliberately conservative, and for good reason: academic communities are sensitive about data use. The fact that NeurIPS adopted it anyway suggests the perceived value is high enough to overcome that friction. If PAT becomes a standard pre-submission step at major venues, it normalizes AI-assisted authorship feedback at scale — a shift with downstream effects on how papers are written and how reviewers calibrate their expectations.

What This Means in Practice

PaperVizAgent and ScholarPeer target the publication pipeline at phases where AI has historically underperformed. Figure generation requires domain-specific precision and style knowledge; automated reviewing requires grounding in current literature, not just pattern matching against training data. Both agents use multi-step orchestration specifically to address those failure modes.

The immediate value is asymmetric by institution. Research groups at well-resourced universities with dedicated technical staff get less marginal benefit — they already have figure designers and extensive reviewer networks. The tools are more disruptive for smaller labs, independent researchers, and teams in regions where English-language peer review presents an additional barrier.

Institutions need to decide how to handle this quickly. If AI-assisted figure generation becomes routine, does it require disclosure? If ScholarPeer is used to draft a referee report, does that report carry the same intellectual accountability as a human-written one? These are not hypothetical questions — they are live policy questions for conference program chairs today. As we noted when Sakana’s AI Scientist passed peer review at Nature, institutions that wait for consensus will find that norms have already formed without them.

Google’s broader trajectory here is also worth watching. PaperVizAgent, ScholarPeer, and PAT are separate tools, but they address adjacent steps in the same pipeline — the one increasingly targeted by AI automation. The logical end state is an integrated authorship-to-review pipeline where AI agents handle figures, structure, literature grounding, and pre-submission review in sequence. Whether that end state helps science or homogenizes it is the more important question, and nobody has answered it yet.

Google’s AI Agents Now Target Figures and Peer Review

Two Bottlenecks, Two Agents

PaperVizAgent: From Text to Publication-Ready Figures

How the Benchmark Reads

ScholarPeer: Peer Review That Actually Reads the Literature

PAT at NeurIPS: Conferences Are Moving First

What This Means in Practice

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

Google’s AI Agents Now Target Figures and Peer Review

Two Bottlenecks, Two Agents

PaperVizAgent: From Text to Publication-Ready Figures

How the Benchmark Reads

ScholarPeer: Peer Review That Actually Reads the Literature

PAT at NeurIPS: Conferences Are Moving First

What This Means in Practice

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

Enjoyed this? Get one AI insight per day.

Related Articles

Google’s AI Agents Now Target Figures and Peer Review

Cloudflare Project Think: Durable AI Agents at the Edge

GLM-5.1 vs GPT-5.5 vs Claude: May 2026 Benchmarks