← all repositories
aiming-lab/AutoResearchClaw

A lobster that writes your PhD for you (maybe)

AutoResearchClaw turns a chat message into a full academic paper, with real citations and sandboxed experiments.

13.3k stars Python AgentsLLMOps · Eval
AutoResearchClaw
Velocity · 7d
+156
★ / day
Trend
steady
star history

What it does

AutoResearchClaw is a 23-stage pipeline that takes a research topic and outputs a conference-ready paper: LaTeX, BibTeX with verified references, generated experiments, charts, and even multi-agent peer reviews. It pulls real literature from OpenAlex, Semantic Scholar, and arXiv, runs code in hardware-aware Docker sandboxes (GPU/MPS/CPU auto-detected), and targets NeurIPS/ICML/ICLR templates. You can run it fully autonomous or use Co-Pilot mode to intervene at key decision points.

The interesting bit

The project doesn’t just generate text—it tries to close the loop. Failed experiments trigger self-healing; fake citations get killed by a 4-layer verification system (arXiv, CrossRef, DataCite, LLM). A companion system called MetaClaw extracts “lessons” from failed runs and injects them back into future pipelines. The team also released ARC-Bench, a 55-topic benchmark for evaluating autonomous research across ML, physics, biology, and statistics.

Key highlights

  • 6 HITL intervention modes from full-auto to step-by-step co-pilot
  • Domain-specific experiment agents: high-energy physics (MadGraph5), biology (COBRApy), statistics, plus generic Docker for chemistry/materials
  • Cross-platform: runs via CLI or bridges to Discord, Telegram, Lark, WeChat through OpenClaw
  • 2,699 tests passed (per their badge); MIT licensed
  • Self-described as looking for testers—feedback shapes the next version

Caveats

  • The “+18.3% robustness” claim from MetaClaw integration is self-reported with no detail on methodology or baseline
  • Showcase papers are generated, not peer-reviewed; quality unclear without independent evaluation
  • README is heavy on feature lists and light on actual limitations or failure modes

Verdict

Worth a look if you’re researching LLM agents for scientific workflows or need a structured paper draft to iterate on. Not a replacement for actually doing science—more like an extremely ambitious scaffolding tool that sometimes builds the whole house.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.