A 32-agent peer review for your paper, before the real peer review
Claude Code skills that run your research through a full academic pipeline—research, writing, staged integrity checks, and multi-perspective review—while keeping a human in the driver's seat.

What it does
ARS is a plugin for Claude Code that orchestrates academic paper production through a 10-stage pipeline: research, write, two integrity gates, peer review, revise, re-review, and finalize. It ships as modular skills—Deep Research (13 agents), Academic Paper (12 agents), Reviewer (7 agents), and a pipeline orchestrator—invoked via slash commands like /ars-plan or /ars-lit-review.
The tool explicitly refuses to write your paper for you. It handles citation hunting, formatting, statistical cross-checks, and logical consistency, while the human does the framing, method selection, and interpretation. A built-in Style Calibration module learns your voice from past work; Writing Quality Check flags machine-like prose patterns.
The interesting bit
The project is essentially a reaction to The AI Scientist—a fully autonomous system that actually got a paper through an ICLR 2025 workshop blind review. ARS’s authors cite that system’s failure modes (hallucinated results, bug-as-insight reframing, citation hallucinations) as the reason for keeping a human in the loop. The integrity gates even run a 7-mode blocking checklist against those exact failure modes.
More unusually, the Reviewer skill includes a Devil’s Advocate agent and an opt-in calibration mode where you feed it a gold-standard review set so it can measure its own false-negative and false-positive rates.
Key highlights
- Claim-level citation audit (v3.8): Optional
ARS_CLAIM_AUDIT=1fetches cited sources and judges whether the claim is actually supported, with five HIGH-WARN refusal classes including “claim-not-supported” and “fabricated-reference” - Cross-model verification: Optional
ARS_CROSS_MODELruns checks across different models to catch model-specific blind spots - Material Passport + repro_lock: Tracks provenance through the pipeline; optional lockfile documents configuration for reproducibility (with honest caveats that LLM outputs aren’t byte-reproducible)
- Cost estimate: ~$4–6 for a full 15,000-word paper pipeline per the docs
- Companion Experiment Agent: Separate skill for running code experiments or human studies with IRB checklists, then feeding verified results back into Stage 2
Caveats
- The project is tightly bound to Claude Code’s plugin system; you’ll need an Anthropic API key and recent Claude Code CLI/IDE extension
- Post-publication audit of the showcase example still found 21 issues in 68 references that three prior integrity rounds missed—suggesting the gates catch a lot, but not everything
- Corpus-scale evaluation of ARS itself against real hallucination rates is listed as future work
- CC BY-NC 4.0 license means no commercial use
Verdict
Grad students and early-career researchers drowning in lit reviews and citation formatting should try the /ars-plan Socratic dialogue. If you already have a mature LaTeX workflow, a human co-author network, and strong institutional library access, much of this may be redundant. Not for anyone looking to automate away authorship—the tool is explicit that it exists to make you write better, not to hide that you used AI.