Stanford's STORM automates the homework you pretend to do
An LLM system that researches a topic, asks its own questions, and writes a cited report—because 'I'll Google it' was never this thorough.
What it does
STORM is a Python system that writes Wikipedia-style articles from scratch. It searches the internet, generates an outline, simulates conversations between a writer and topic experts, then produces a full-length article with citations. Co-STORM adds a collaborative mode where humans can observe or steer a multi-agent discourse, with a dynamic mind map tracking the growing knowledge graph.
The interesting bit
The core insight is that asking good questions is harder than answering them. STORM doesn’t just prompt an LLM to ask questions—it first discovers “perspectives” by surveying existing articles on similar topics, then simulates grounded conversations to generate follow-ups. It’s essentially automated intellectual curiosity, with a budget line for API calls.
Key highlights
- Modular pipeline: separate models for conversation simulation, outline generation, article writing, and polishing—cheaper models for grunt work, expensive ones for final output
- Supports 9+ search/retrieval backends including Bing, Google, DuckDuckGo, Tavily, and vector search over your own documents via
VectorRM - Now uses litellm (v1.1.0) for unified access to language and embedding models
- Co-STORM’s collaborative protocol with turn management and a live-updating mind map for shared conceptual space
- 70,000+ people have tried the live research preview; presented at NAACL 2024 and EMNLP 2024
Caveats
- The authors explicitly state it “cannot produce publication-ready articles” without significant editing
- Requires setting up multiple API keys and model configurations; the
STORMWikiRunnerconstructor is not a one-liner - Co-STORM’s human-in-the-loop mode needs active user engagement to be useful, not just passive consumption
Verdict
Worth exploring if you regularly produce research briefs, literature reviews, or structured knowledge bases. Skip it if you need polished prose out of the box, or if your budget can’t handle multi-model, multi-search API calls per article.