← all repositories

GoogleCloudPlatform/localllm

Google Cloud Platform tool for running quantized large language models locally on Cloud Workstations via a llama.cpp web server.

localllm
Velocity · 7d
+1.7
★ / day
Trend
steady
star history

This repository provides tooling to run large language models locally using llama-cpp-python’s webserver. It includes a Dockerfile for creating custom Cloud Workstation base images that bundle the LLM serving infrastructure, leveraging quantized models from Hugging Face. The setup automates GCP infrastructure provisioning including Artifact Registry, Cloud Build, and workstation configuration for local LLM deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.