withcatai/node-llama-cpp
Node.js bindings enabling local LLM inference via llama.cpp with Metal, CUDA, and Vulkan GPU support.

Velocity · 7d
+2.0
★ / day
Trend
→steady
star history
This library wraps llama.cpp to provide a complete Node.js interface for running large language models locally. It supports GPU acceleration across multiple backends, pre-built binaries for easy installation, and enforces structured output formats like JSON schemas during generation. The library includes embedding generation, function calling capabilities, and a CLI for chatting with models without writing code.