soulteary/docker-llama2-chat
A Docker-based project enabling local LLaMA2 model deployment in three steps with support for GPU, CPU inference, and quantized variants.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
The repository provides Docker configurations and tooling to run LLaMA2 models locally without heavy setup. It supports official Meta LLaMA2 (7B/13B), Chinese-adapted variants, and quantized versions via llama.cpp/GGML. Users can run models on GPU (8-14GB vRAM), with Transformers quantization (5GB vRAM), or CPU-only using llama.cpp.