← all repositories
TIGER-AI-Lab/OpenResearcher

A 30B model that beats GPT-4.1 at deep research—fully open

OpenResearcher publishes the whole recipe: data, model, training code, and evaluation, so you can replicate or extend deep-research agents without proprietary APIs.

OpenResearcher
Velocity · 7d
+6.0
★ / day
Trend
steady
collecting data…
star history

What it does

OpenResearcher is a fully open-source pipeline for training agentic LLMs to perform long-horizon web research. It includes a 30B-A3B model, a 96K-turn trajectory dataset distilled from GPT-OSS-120B, a self-built retriever over an ~11B-token corpus, and evaluation frameworks for benchmarks like BrowseComp-Plus and GAIA. The goal is to let anyone reproduce—and improve—deep-research agents without depending on closed commercial systems.

The interesting bit

The project achieves 54.8% on BrowseComp-Plus, which the README claims surpasses GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, DeepSeek-R1, and Tongyi-DeepResearch. NVIDIA has adopted the data for its Nemotron 3 Ultra model and NeMo Data Designer. The retriever runs locally, so training doesn’t require paid search APIs—an unusual cost-cutting move in this space.

Key highlights

  • Full stack is open: dataset, 30B-A3B model weights, training code, distillation recipe, and evaluation framework
  • Self-hosted dense retrieval over ~11B tokens eliminates external search API dependency for training
  • Scores 54.8% on BrowseComp-Plus; also evaluated on BrowseComp, GAIA, and xbench-DeepResearch
  • NVIDIA Nemotron family and NeMo Data Designer have adopted the data
  • Hardware baseline is 8× A100 80GB, though other setups are possible with parameter tweaks

Caveats

  • The README is sparse on architectural details for the 30B-A3B model (the “A3B” suffix is unexplained)
  • Evaluation comparisons against closed models are reported by the authors, not independently verified
  • BrowseComp-Plus local search requires separate embedding model deployment and significant GPU resources

Verdict

Worth studying if you’re building research agents and need a reproducible baseline, or if you’re skeptical of black-box “deep research” features and want to inspect the gears. Skip if you need something that runs comfortably on consumer hardware or want a polished end-user product rather than a research artifact.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.