← all repositories
oxylabs/ai-crawler-py

Scraping without the selector soup

A Python SDK that lets you crawl websites by describing what you want in plain English, then get back structured JSON or Markdown.

3k stars AgentsData Tooling
ai-crawler-py
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does

oxylabs-ai-studio is a thin Python wrapper around Oxylabs’ hosted AI-Crawler service. You hand it a starting URL and a natural language prompt—“Find all Halo games for Xbox”—and it explores the site, picks relevant pages, and returns extracted data as JSON or Markdown. The SDK also auto-generates OpenAPI-style parsing schemas from plain English descriptions.

The interesting bit

The heavy lifting (crawling, rendering JS, geo-targeted proxies, the actual LLM reasoning) all happens on Oxylabs’ servers; this repo is essentially API glue. The value is in the abstraction: no CSS selectors, no XPath, no custom spider logic. You trade control for convenience, which is either liberating or unnerving depending on your trust in black-box AI agents.

Key highlights

  • Natural language prompts drive both URL discovery and data extraction schema generation
  • Supports JavaScript-rendered pages and geo-targeted crawling via parameters
  • Output formats: structured JSON (with schema) or raw Markdown
  • Free trial with 1,000 credits; paid plans start at $12/month
  • Companion JavaScript SDK available for non-Python stacks

Caveats

  • Requires an Oxylabs API key; not self-hostable
  • JSON output mandates a schema, though you can auto-generate one
  • “Experimental” label on the service, and pricing is credit-based with rate limits

Verdict

Worth a spin if you need quick, one-off data extraction without maintaining scrapers. Skip it if you need deterministic behavior, fine-grained crawl control, or can’t stomach SaaS lock-in for core infrastructure.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.