← all repositories
oxylabs/oxylabs-ai-studio-py

Scraping by talking to the web like it's a chatbot

A Python SDK that turns natural language prompts into structured data extraction via Oxylabs' hosted AI scraping services.

2.9k stars Python Data ToolingLLMOps · Eval
oxylabs-ai-studio-py
Velocity · 7d
+8.3
★ / day
Trend
steady
star history

What it does

This is a thin Python wrapper around Oxylabs’ commercial AI Studio API. You instantiate a client with your API key, then ask it to crawl, scrape, search, or drive a browser agent using plain English prompts. The heavy lifting—rendering JavaScript, proxy rotation, geo-targeting, and the actual LLM-powered extraction—happens on Oxylabs’ servers, not your machine.

The interesting bit

The SDK auto-generates JSON schemas from natural language descriptions (generate_schema), which then feed structured extraction. It’s a neat convenience, though the README doesn’t explain what model or logic drives that generation. The search endpoint also auto-routes to an instant (non-polling) path when limit <= 10 and you don’t need content back—a small but sensible optimization.

Key highlights

  • Five tools in one SDK: AiCrawler, AiScraper, BrowserAgent, AiSearch, and AiMap
  • Output formats include markdown, JSON, CSV, HTML, screenshots, and something called "toon" (unexplained in the README)
  • Geo-location targeting via proxy, with ISO2 codes or canonical country names
  • Async versions of every method
  • Python 3.10+ required; install via pip install oxylabs-ai-studio

Caveats

  • Requires an Oxylabs API key and presumably paid credits; max_credits parameter suggests metered usage
  • "toon" output format appears twice with zero explanation
  • The README is essentially API reference; no architecture details, pricing, or rate limits disclosed

Verdict

Useful if you’re already bought into Oxylabs’ ecosystem and want to orchestrate their AI scraping tools from Python without hand-rolling HTTP calls. Skip it if you need self-hosted scraping, transparent pricing, or fine-grained control over the extraction pipeline.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.