← all repositories
handrew/browserpilot

Selenium, but you yell instructions at it in English

BrowserPilot turns natural language into Selenium code via GPT-3, for developers who'd rather write "click the big blue button" than XPath.

browserpilot
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does BrowserPilot takes a plain-English instruction list, feeds it to GPT-3, and translates it into executed Selenium code. You write things like “Find all textareas. Click the first visible one. Type ‘buffalo buffalo buffalo’ and press enter.” The agent compiles this to Python, runs it, and can even cache the compiled output to skip future API calls.

The interesting bit The project is essentially a very elaborate prompt: a fixed “vocabulary” of actions (find, click, scroll, ask_llm_to_find_element, etc.) is described to GPT-3 in the system prompt, and GPT-3 must map your English to those exact method names. The author notes this is “more like writing code with Copilot than talking to a friend” — you still need to think like a DOM programmer, just without the syntax.

Key highlights

  • Supports reusable functions via BEGIN_FUNCTION / END_FUNCTION blocks
  • Includes a Memory module for querying past browsed pages via embeddings
  • Can output compiled instructions to YAML to avoid repeat API costs
  • Added Selenium Grid support in recent versions for remote execution
  • Ships with a Studio CLI for iterative prompt testing

Caveats

  • Security: runs GPT-3’s output through Python exec() — the README explicitly warns this is unsafe
  • Requires Chromedriver setup and an OpenAI API key; not a standalone browser
  • GPT-3.5-turbo “takes too many freedoms” and keeps trying to import modules, which the author manually strips

Verdict Worth a look if you maintain brittle Selenium suites and want to experiment with LLM-generated locators. Skip it if you need reliability, security, or have strong feelings about exec() running untrusted code.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.