Is webllama open source?

Yes — McGill-NLP/webllama is open source, released under the MIT license.

What language is webllama written in?

McGill-NLP/webllama is primarily written in Python.

How popular is webllama?

McGill-NLP/webllama has 1.4k stars on GitHub.

Where can I find webllama?

McGill-NLP/webllama is on GitHub at https://github.com/McGill-NLP/webllama.

← all repositories

McGill-NLP/webllama

An 8B Llama that out-navigates zero-shot GPT-4V on real websites

WebLlama fine-tunes Llama-3-8B to browse, click, and chat through websites, then benchmarks it on out-of-domain tasks to avoid the usual demo-video hype.

★1.4k stars Python Agents Language Models LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does WebLlama is a research framework that finetunes Meta’s Llama-3-8B-Instruct into a web-browsing agent called Llama-3-8B-Web. Trained on a 24K subset of expert-annotated WebLINX 1.0 interactions, the model handles actions like click, textinput, and submit while engaging in multi-turn dialogue. The repository provides training scripts, evaluation code, a Streamlit results viewer, and a BrowserGym integration for testing on 150 real websites.

The interesting bit The authors are openly skeptical of agent demo culture, insisting that systematic benchmarking across unseen domains, geographies, and screen-free dialogue is the only way to judge real capability. Their 8B model scores 28.8% on out-of-domain WebLINX 1.0 splits, compared to 10.5% for zero-shot GPT-4V, though they acknowledge that millions of websites remain unseen.

Key highlights

Llama-3-8B-Web surpasses zero-shot GPT-4V by over 18% overall on WebLINX 1.0, with stronger link selection (34.1% vs 18.9% seg-F1), element targeting (27.1% vs 13.6% IoU), and response alignment (37.5% vs 3.1% chr-F1).
The WebLINX benchmark includes four out-of-domain test splits—new websites, new domains, unseen geographic locations, and dialogue-only browsing where the user cannot see the screen.
The project releases training data, model weights, and a BrowserGym extension (WebLINX 1.1) that adds tab actions and enables comparison with other agent frameworks.
A Streamlit app aggregates evaluation results for quick visualization without custom plotting code.

Caveats

Playwright and Selenium integrations are listed as goals but not yet available; only BrowserGym is currently supported.
The authors explicitly note that 24K training examples from 150 websites is only a starting point, and generalization to the broader web remains an open problem.

Verdict Researchers and engineers building open, benchmark-driven web agents should look here for the training pipeline and a strong 8B baseline. If you need a mature, drop-in browser automation tool for production, this is still firmly a research project.

Frequently asked

What is McGill-NLP/webllama?: WebLlama fine-tunes Llama-3-8B to browse, click, and chat through websites, then benchmarks it on out-of-domain tasks to avoid the usual demo-video hype.
Is webllama open source?: Yes — McGill-NLP/webllama is open source, released under the MIT license.
What language is webllama written in?: McGill-NLP/webllama is primarily written in Python.
How popular is webllama?: McGill-NLP/webllama has 1.4k stars on GitHub.
Where can I find webllama?: McGill-NLP/webllama is on GitHub at https://github.com/McGill-NLP/webllama.