← all repositories
firecrawl/web-agent

Firecrawl open-sources its web-research stack

The hosted agent you can pay for, now forkable and model-swappable.

1.1k stars TypeScript AgentsData Tooling
web-agent
Velocity · 7d
+15
★ / day
Trend
steady
star history

What it does

This repo is the open-source foundation behind Firecrawl’s hosted research agent. It gives you a layered toolkit for building autonomous web-research bots: Next.js and Express templates on top, an orchestration core in the middle, and Firecrawl’s search/scrape/interact tools at the bottom. You scaffold via CLI, swap in your own models, and deploy where you like.

The interesting bit

The architecture is deliberately stacked like a diner menu — start with a full Next.js app or peel down to raw primitives. The orchestration rides on LangChain’s Deep Agents for the plan-act loop and parallel sub-agent spawning, which saves reinventing the agent harness. Skills are just markdown files auto-discovered from disk, loaded on demand by middleware. It’s a pragmatic glue job rather than a from-scratch framework.

Key highlights

  • Layered stack: Next.js template → Express template → Agent Core library → AI SDK → base SDK → REST API
  • Parallel subagents: Independent workers with isolated browser sessions, spawned via Deep Agents’ task tool
  • Skills as markdown: Reusable SKILL.md playbooks in agent-core/src/skills/definitions/, auto-discovered and loaded on demand
  • Structured output: JSON formatting plus bashExec data processing via Vercel’s just-bash
  • Streaming support: Built into the Next.js template and core examples

Caveats

  • The hosted “Spark 1” models are proprietary; you bring your own LLM to the open-source version
  • Documentation is thin — the README points outward to docs.firecrawl.dev and npm packages for most layers
  • 1,112 stars suggests early traction, but real-world durability at scale is unproven in the sources

Verdict

Worth a look if you’re building research agents and want a head start on orchestration, web tools, and deployment patterns without marrying a closed platform. Skip it if you need a fully documented, batteries-included framework with training wheels attached.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.