← all repositories
SakanaAI/AI-Scientist

An LLM that writes its own papers — and runs the experiments

SakanaAI's system automates the full research loop: idea generation, code execution, paper writing, and even peer review.

13.9k stars Jupyter Notebook AgentsDomain Apps
AI-Scientist
Velocity · 7d
+21
★ / day
Trend
steady
star history

What it does

The AI Scientist is an end-to-end pipeline that hands a research template to an LLM and lets it rip. The model proposes hypotheses, writes and executes experiment code, generates plots, writes a LaTeX paper with citations, and can even produce an LLM-generated review of its own work. It ships with three built-in templates covering NanoGPT, 2D diffusion, and grokking studies.

The interesting bit

The system doesn’t just draft prose — it actually executes the code it writes, which means it can iterate on failed experiments or chase dead ends autonomously. The authors candidly recommend reading the generated “Claude papers” to see where the system shines and where it hallucinates its way into nonsense.

Key highlights

  • Supports frontier models: GPT-4o, Claude 3.5 Sonnet, DeepSeek, Gemini, and others via OpenRouter
  • Includes literature search via Semantic Scholar or OpenAlex for real citations
  • Multi-GPU parallelization with --parallel for running multiple ideas at once
  • Community-contributed templates extend beyond the three official domains
  • Generated papers are compiled to PDF with proper LaTeX formatting

Caveats

  • Security warning: The system executes LLM-written code with full autonomy, including potential web access and arbitrary package installation; the README explicitly demands containerization
  • Linux + NVIDIA CUDA only; CPU-only machines are “infeasible” and other OSes need “significant adjustments”
  • texlive-full installation is notoriously slow and interactive
  • Only frontier models above original GPT-4 capability are recommended

Verdict

Researchers in ML/AI who want to automate tedious experimental sweeps should look closely — but treat it like a chemistry set with the safety goggles on. If you’re not comfortable sandboxing arbitrary LLM-generated code, this is not your project.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.