mlc-ai/web-llm
A high-performance in-browser language model inference engine accelerated by WebGPU.

Velocity · 7d
+16
★ / day
Trend
→steady
star history
WebLLM runs open-source LLMs directly in web browsers with hardware acceleration via WebGPU, requiring no server support. It provides OpenAI API compatibility for local inference, supporting features like streaming and JSON mode. The project enables privacy-preserving AI assistants while leveraging browser-based GPU acceleration, and serves as a companion to MLC LLM for universal model deployment.