0hq/WebGPT
A vanilla JavaScript implementation of GPT transformer inference running in web browsers via WebGPU.

WebGPT runs GPT language models directly in web browsers using WebGPU compute shaders for near-native GPU performance. It implements the full transformer architecture including embeddings, multi-head attention, and feedforward layers entirely in vanilla JS. The project has been tested with models up to 1.5B parameters, reporting benchmark timings (ms/token) for models ranging from 5M to 1.5B parameters on Apple M1 hardware. It includes pre-converted GPT-2 117M and toy Shakespeare models, with scripts provided for importing custom models.