gotzmann/llama.go

A pure Go implementation of LLaMA model inference based on llama.cpp.

★1.4k stars Go Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+1.2

★ / day

Trend

→steady

star history

This project reimplement llama.cpp in pure Golang to enable running LLaMA models locally without GPU clusters. It implements the LLaMA neural network architecture, supports tensor math in pure Go, and includes model loading for 7B through 65B parameter models. The library also supports cross-platform deployment (Mac, Linux, Windows), ARM NEON optimization for Apple Silicon, and AVX2 for x64 architectures. The V2 roadmap includes LLaMA V2 support with grouped query attention and INT8 quantization.