Selenium, but you yell instructions at it in English
BrowserPilot turns natural language into Selenium code via GPT-3, for developers who'd rather write "click the big blue button" than XPath.

What it does BrowserPilot takes a plain-English instruction list, feeds it to GPT-3, and translates it into executed Selenium code. You write things like “Find all textareas. Click the first visible one. Type ‘buffalo buffalo buffalo’ and press enter.” The agent compiles this to Python, runs it, and can even cache the compiled output to skip future API calls.
The interesting bit The project is essentially a very elaborate prompt: a fixed “vocabulary” of actions (find, click, scroll, ask_llm_to_find_element, etc.) is described to GPT-3 in the system prompt, and GPT-3 must map your English to those exact method names. The author notes this is “more like writing code with Copilot than talking to a friend” — you still need to think like a DOM programmer, just without the syntax.
Key highlights
- Supports reusable functions via
BEGIN_FUNCTION/END_FUNCTIONblocks - Includes a
Memorymodule for querying past browsed pages via embeddings - Can output compiled instructions to YAML to avoid repeat API costs
- Added Selenium Grid support in recent versions for remote execution
- Ships with a
StudioCLI for iterative prompt testing
Caveats
- Security: runs GPT-3’s output through Python
exec()— the README explicitly warns this is unsafe - Requires Chromedriver setup and an OpenAI API key; not a standalone browser
- GPT-3.5-turbo “takes too many freedoms” and keeps trying to import modules, which the author manually strips
Verdict
Worth a look if you maintain brittle Selenium suites and want to experiment with LLM-generated locators. Skip it if you need reliability, security, or have strong feelings about exec() running untrusted code.