← all repositories
Imbad0202/academic-research-skills

A 32-agent peer review for your paper, before the real peer review

Claude Code skills that run your research through a full academic pipeline—research, writing, staged integrity checks, and multi-perspective review—while keeping a human in the driver's seat.

28.6k stars Python Coding AssistantsLLMOps · Eval
academic-research-skills
Velocity · 7d
+280
★ / day
Trend
steady
star history

What it does

ARS is a plugin for Claude Code that orchestrates academic paper production through a 10-stage pipeline: research, write, two integrity gates, peer review, revise, re-review, and finalize. It ships as modular skills—Deep Research (13 agents), Academic Paper (12 agents), Reviewer (7 agents), and a pipeline orchestrator—invoked via slash commands like /ars-plan or /ars-lit-review.

The tool explicitly refuses to write your paper for you. It handles citation hunting, formatting, statistical cross-checks, and logical consistency, while the human does the framing, method selection, and interpretation. A built-in Style Calibration module learns your voice from past work; Writing Quality Check flags machine-like prose patterns.

The interesting bit

The project is essentially a reaction to The AI Scientist—a fully autonomous system that actually got a paper through an ICLR 2025 workshop blind review. ARS’s authors cite that system’s failure modes (hallucinated results, bug-as-insight reframing, citation hallucinations) as the reason for keeping a human in the loop. The integrity gates even run a 7-mode blocking checklist against those exact failure modes.

More unusually, the Reviewer skill includes a Devil’s Advocate agent and an opt-in calibration mode where you feed it a gold-standard review set so it can measure its own false-negative and false-positive rates.

Key highlights

  • Claim-level citation audit (v3.8): Optional ARS_CLAIM_AUDIT=1 fetches cited sources and judges whether the claim is actually supported, with five HIGH-WARN refusal classes including “claim-not-supported” and “fabricated-reference”
  • Cross-model verification: Optional ARS_CROSS_MODEL runs checks across different models to catch model-specific blind spots
  • Material Passport + repro_lock: Tracks provenance through the pipeline; optional lockfile documents configuration for reproducibility (with honest caveats that LLM outputs aren’t byte-reproducible)
  • Cost estimate: ~$4–6 for a full 15,000-word paper pipeline per the docs
  • Companion Experiment Agent: Separate skill for running code experiments or human studies with IRB checklists, then feeding verified results back into Stage 2

Caveats

  • The project is tightly bound to Claude Code’s plugin system; you’ll need an Anthropic API key and recent Claude Code CLI/IDE extension
  • Post-publication audit of the showcase example still found 21 issues in 68 references that three prior integrity rounds missed—suggesting the gates catch a lot, but not everything
  • Corpus-scale evaluation of ARS itself against real hallucination rates is listed as future work
  • CC BY-NC 4.0 license means no commercial use

Verdict

Grad students and early-career researchers drowning in lit reviews and citation formatting should try the /ars-plan Socratic dialogue. If you already have a mature LaTeX workflow, a human co-author network, and strong institutional library access, much of this may be redundant. Not for anyone looking to automate away authorship—the tool is explicit that it exists to make you write better, not to hide that you used AI.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.