An LLM that writes its own papers — and runs the experiments
SakanaAI's system automates the full research loop: idea generation, code execution, paper writing, and even peer review.

What it does
The AI Scientist is an end-to-end pipeline that hands a research template to an LLM and lets it rip. The model proposes hypotheses, writes and executes experiment code, generates plots, writes a LaTeX paper with citations, and can even produce an LLM-generated review of its own work. It ships with three built-in templates covering NanoGPT, 2D diffusion, and grokking studies.
The interesting bit
The system doesn’t just draft prose — it actually executes the code it writes, which means it can iterate on failed experiments or chase dead ends autonomously. The authors candidly recommend reading the generated “Claude papers” to see where the system shines and where it hallucinates its way into nonsense.
Key highlights
- Supports frontier models: GPT-4o, Claude 3.5 Sonnet, DeepSeek, Gemini, and others via OpenRouter
- Includes literature search via Semantic Scholar or OpenAlex for real citations
- Multi-GPU parallelization with
--parallelfor running multiple ideas at once - Community-contributed templates extend beyond the three official domains
- Generated papers are compiled to PDF with proper LaTeX formatting
Caveats
- Security warning: The system executes LLM-written code with full autonomy, including potential web access and arbitrary package installation; the README explicitly demands containerization
- Linux + NVIDIA CUDA only; CPU-only machines are “infeasible” and other OSes need “significant adjustments”
texlive-fullinstallation is notoriously slow and interactive- Only frontier models above original GPT-4 capability are recommended
Verdict
Researchers in ML/AI who want to automate tedious experimental sweeps should look closely — but treat it like a chemistry set with the safety goggles on. If you’re not comfortable sandboxing arbitrary LLM-generated code, this is not your project.