ngxson/wllama
WebAssembly binding for llama.cpp that enables running LLM inference directly in web browsers using WebGPU and WASM SIMD.

Velocity · 7d
+1.4
★ / day
Trend
→steady
star history
wllama provides WebAssembly bindings for the llama.cpp C++ library, allowing large language models to run entirely client-side in web browsers without any backend server. It leverages WebGPU for GPU acceleration and WebAssembly SIMD for multi-threaded CPU inference. The project supports multimodal inputs including images and audio, tool calling capabilities, and exposes an OpenAI-compatible API.