Is web-llm open source?

Yes — mlc-ai/web-llm is open source, released under the Apache-2.0 license.

What language is web-llm written in?

mlc-ai/web-llm is primarily written in TypeScript.

How popular is web-llm?

mlc-ai/web-llm has 18.4k stars on GitHub and is currently accelerating.

Where can I find web-llm?

mlc-ai/web-llm is on GitHub at https://github.com/mlc-ai/web-llm.

← all repositories

mlc-ai/web-llm

LLM inference that never leaves the browser tab

WebLLM runs open-source LLMs directly inside the browser via WebGPU and exposes them through a local OpenAI-compatible API.

★18.4k stars TypeScript Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+9.4

★ / day

Trend

↗accelerating

star history

What it does WebLLM is an inference engine that downloads and executes quantized LLMs—including Llama, Phi, Mistral, Qwen, and others—entirely within the browser using WebGPU acceleration. It exposes a local MLCEngine that mimics the OpenAI chat-completions interface, so existing client code can point to an offline model without rewriting requests. No server round-trips are required after the initial model fetch.

The interesting bit Instead of treating the browser as a thin client, WebLLM compiles models to an MLC format and pushes heavy computation into WebWorkers or ServiceWorkers, keeping the main thread free while still leveraging GPU acceleration.

Key highlights

Runs fully offline after the first model download, with no backend infrastructure.
OpenAI API compatibility for streaming, JSON-mode structured generation, and logit control.
Supports multiple cache backends: Cache API, IndexedDB, OPFS, and an experimental Cross-Origin Storage API.
Broad model support including Llama 3, Phi 3, Gemma, Mistral, and Qwen.
Offload inference to Web Workers or Service Workers to protect UI responsiveness.

Caveats

Function calling is marked work-in-progress.
First-time model downloads can be slow; the README warns you must handle the asynchronous load gracefully.
The experimental Cross-Origin Storage backend requires a Chrome extension and lacks programmatic cache clearing.

Verdict Front-end developers who want private, client-side AI or need to drop server inference costs should look here. If you need multi-user concurrency or massive context windows beyond consumer GPU limits, this is the wrong stack.

Frequently asked

What is mlc-ai/web-llm?: WebLLM runs open-source LLMs directly inside the browser via WebGPU and exposes them through a local OpenAI-compatible API.
Is web-llm open source?: Yes — mlc-ai/web-llm is open source, released under the Apache-2.0 license.
What language is web-llm written in?: mlc-ai/web-llm is primarily written in TypeScript.
How popular is web-llm?: mlc-ai/web-llm has 18.4k stars on GitHub and is currently accelerating.
Where can I find web-llm?: mlc-ai/web-llm is on GitHub at https://github.com/mlc-ai/web-llm.