← all repositories

soulteary/docker-llama2-chat

A Docker-based project enabling local LLaMA2 model deployment in three steps with support for GPU, CPU inference, and quantized variants.

docker-llama2-chat
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

The repository provides Docker configurations and tooling to run LLaMA2 models locally without heavy setup. It supports official Meta LLaMA2 (7B/13B), Chinese-adapted variants, and quantized versions via llama.cpp/GGML. Users can run models on GPU (8-14GB vRAM), with Transformers quantization (5GB vRAM), or CPU-only using llama.cpp.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.