Is AnyCrawl open source?

Yes — any4ai/AnyCrawl is open source, released under the MIT license.

What language is AnyCrawl written in?

any4ai/AnyCrawl is primarily written in TypeScript.

How popular is AnyCrawl?

any4ai/AnyCrawl has 3.4k stars on GitHub and is currently cooling off.

Where can I find AnyCrawl?

any4ai/AnyCrawl is on GitHub at https://github.com/any4ai/AnyCrawl.

← all repositories

any4ai/AnyCrawl

A scraper that speaks fluent LLM

Node.js crawler that turns raw web pages into structured JSON via an LLM layer, with SERP support and a self-hosted API.

★3.4k stars TypeScript Data Tooling RAG · Search

View on GitHub ↗ Homepage ↗

Velocity · 7d

+0.6

★ / day

Trend

↘cooling

star history

What it does AnyCrawl is a self-hostable Node.js/TypeScript crawling toolkit with three main jobs: scrape single pages, crawl entire sites, and extract structured search results from Google (with Bing and Baidu promised). It exposes everything through a REST API and can hand off page content to an LLM for structured JSON extraction.

The interesting bit The LLM extraction layer is the hook. Instead of just dumping HTML or markdown, you pass a JSON schema and the tool asks an LLM to fill it in — company mission, employee count, boolean flags, whatever you define. It also supports Atlas Cloud as an OpenAI-compatible provider out of the box, which feels like a sponsorship integration dressed up as a feature.

Key highlights

Three engines: Cheerio for static HTML, Playwright and Puppeteer for JS-rendered pages
Site crawling with depth limits, domain scoping, and path include/exclude rules
Built-in proxy support plus a default proxy (details vague; “high-quality” is the README’s word)
Redis-backed caching with S3 support for self-hosted deployments
Multi-threading and multi-process batch processing
MIT licensed

Caveats

“Multiple search engines” is overstated: only Google is listed under supported engines, despite the marketing claim
The README is heavy on badges and sponsor banners; actual technical depth lives in external docs
No benchmarks, rate-limiting details, or cost estimates for the LLM extraction path

Verdict Worth a look if you’re building RAG pipelines or AI agents and want a single self-hosted box that can crawl, cache, and structure data via LLM. Skip it if you need reliable multi-engine SERP today or want deep visibility into how the extraction prompts actually behave.

Frequently asked

What is any4ai/AnyCrawl?: Node.js crawler that turns raw web pages into structured JSON via an LLM layer, with SERP support and a self-hosted API.
Is AnyCrawl open source?: Yes — any4ai/AnyCrawl is open source, released under the MIT license.
What language is AnyCrawl written in?: any4ai/AnyCrawl is primarily written in TypeScript.
How popular is AnyCrawl?: any4ai/AnyCrawl has 3.4k stars on GitHub and is currently cooling off.
Where can I find AnyCrawl?: any4ai/AnyCrawl is on GitHub at https://github.com/any4ai/AnyCrawl.