← all repositories
Alibaba-NLP/DeepResearch

Alibaba's 30B research agent runs on 3.3B active params

A sparse Mixture-of-Experts model trained end-to-end for multi-step web research, not just chat.

19.3k stars Python AgentsLanguage Models
DeepResearch
Velocity · 7d
+38
★ / day
Trend
steady
star history

What it does

Tongyi DeepResearch is a 30.5B-parameter Mixture-of-Experts model that activates only 3.3B parameters per token. It is built specifically for long-horizon information-seeking: the model plans searches, reads web pages, parses uploaded files, and reasons across multiple steps to answer complex questions. It supports two inference modes: a standard ReAct loop for evaluation, and a heavier “IterResearch” mode that scales compute at test time for harder tasks.

The interesting bit

Most open models are trained to chat; this one is trained to search. The team built a fully automated synthetic data pipeline for agentic pre-training, then ran large-scale continual pre-training and end-to-end reinforcement learning with a customized on-policy GRPO variant. The result is a model that tops several agentic search benchmarks rather than just language perplexity leaderboards.

Key highlights

  • Sparse MoE architecture: 30.5B total params, 3.3B active per token, 128K context window
  • Trained with automated synthetic data generation, continual pre-training on agentic interactions, and token-level on-policy RL with leave-one-out advantage estimation
  • Supports ReAct and IterResearch inference paradigms; latter uses test-time scaling for maximum performance
  • Available via HuggingFace, ModelScope, OpenRouter API, and Alibaba’s Bailian cloud service
  • Evaluation scripts and inference code provided; requires Python 3.10 and multiple API keys (Serper, Jina, OpenAI-compatible, Dashscope)

Caveats

  • Online demos are explicitly marked “for quick exploration only” and may fail intermittently due to model latency and tool QPS limits; local deployment or Bailian is recommended for stability
  • Setup is involved: you need API keys for search, page reading, summarization, file parsing, and optionally a Python sandbox
  • The README claims “state-of-the-art performance” on several benchmarks but does not provide absolute scores or comparison tables in the excerpt shown

Verdict

Worth a look if you are building autonomous research agents or studying RL training for tool use. Skip it if you want a drop-in chat replacement or lack the API budget and patience to wire up six external services.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.