GoogleCloudPlatform/localllm
Google Cloud Platform tool for running quantized large language models locally on Cloud Workstations via a llama.cpp web server.

Velocity · 7d
+1.7
★ / day
Trend
→steady
star history
This repository provides tooling to run large language models locally using llama-cpp-python’s webserver. It includes a Dockerfile for creating custom Cloud Workstation base images that bundle the LLM serving infrastructure, leveraging quantized models from Hugging Face. The setup automates GCP infrastructure provisioning including Artifact Registry, Cloud Build, and workstation configuration for local LLM deployment.