← all repositories

gotzmann/llama.go

A pure Go implementation of LLaMA model inference based on llama.cpp.

llama.go
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

This project reimplement llama.cpp in pure Golang to enable running LLaMA models locally without GPU clusters. It implements the LLaMA neural network architecture, supports tensor math in pure Go, and includes model loading for 7B through 65B parameter models. The library also supports cross-platform deployment (Mac, Linux, Windows), ARM NEON optimization for Apple Silicon, and AVX2 for x64 architectures. The V2 roadmap includes LLaMA V2 support with grouped query attention and INT8 quantization.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.