← all repositories

reworkd/tarsier

Vision utility library that tags interactable webpage elements with IDs for LLMs to issue commands like CLICK [23].

1.8k stars Jupyter Notebook Agents
tarsier
Velocity · 7d
+1.9
★ / day
Trend
steady
star history

Tarsier provides visual perception capabilities for web interaction agents by tagging buttons, links, and input fields with brackets and IDs, creating a mapping between DOM elements and identifiers that LLMs can reference. It integrates with Playwright and Selenium to capture webpage state, supports OCR for visual structure understanding, and works alongside GPT-4V for multimodal page understanding. The library is published as a PyPI package and is used by Reworkd to power autonomous web agents across thousands of real tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.