← all repositories

ngxson/wllama

WebAssembly binding for llama.cpp that enables running LLM inference directly in web browsers using WebGPU and WASM SIMD.

1.1k stars TypeScript Inference · ServingLanguage Models
wllama
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

wllama provides WebAssembly bindings for the llama.cpp C++ library, allowing large language models to run entirely client-side in web browsers without any backend server. It leverages WebGPU for GPU acceleration and WebAssembly SIMD for multi-threaded CPU inference. The project supports multimodal inputs including images and audio, tool calling capabilities, and exposes an OpenAI-compatible API.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.