Is localllm open source?

Yes — GoogleCloudPlatform/localllm is open source, released under the Apache-2.0 license.

What language is localllm written in?

GoogleCloudPlatform/localllm is primarily written in Python.

How popular is localllm?

GoogleCloudPlatform/localllm has 1.5k stars on GitHub.

Where can I find localllm?

GoogleCloudPlatform/localllm is on GitHub at https://github.com/GoogleCloudPlatform/localllm.

← all repositories

GoogleCloudPlatform/localllm

Your Cloud Workstation can now host its own quantized LLM

It wraps llama-cpp-python and Hugging Face downloads into a small CLI so you can serve quantized models directly on a Cloud Workstation without calling remote APIs.

★1.5k stars Python Inference · Serving Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

local-llm is a thin Python CLI that downloads quantized .gguf models from Hugging Face Hub and serves them using llama-cpp-python’s built-in web server. It is packaged as a Docker image for Google Cloud Workstations, though you can also run it on any local machine. The tool handles the busywork of model caching, filename selection, and process management so you don’t have to babysit the server manually.

The interesting bit

The project treats a remote Cloud Workstation like a beefy local laptop—an e2-standard-32 instance with 128 GB RAM—and keeps inference local once the model is downloaded. It is essentially well-mannered glue code: it auto-selects 4-bit medium quantization for popular TheBloke repositories if you omit the filename, then exposes the standard OpenAPI docs interface for chatting.

Key highlights

Only speaks .gguf; it assumes your models live in the standard Hugging Face Hub cache directory.
Provides a small set of CLI verbs: list, ps, run, kill, pull, and rm.
Ships with a Dockerfile and Cloud Build config to bake the tool into a custom Cloud Workstation image.
Defaults to 4-bit medium quantization for TheBloke models when no specific filename is provided.
Serves an OpenAPI documentation endpoint via the underlying llama-cpp-python web server.

Caveats

Only .gguf files are supported; other model formats are ignored.
Logs from multiple concurrently running models interleave into a single file, which can complicate debugging.
The README dedicates most of its real estate to GCP infrastructure setup rather than the tool itself.

Verdict

Reach for this if you are already living inside GCP Cloud Workstations and want a quick, offline-capable LLM sandbox without metered API calls. Skip it if you need fine-tuning, non-quantized models, or a fully featured model server like vLLM.

Frequently asked

What is GoogleCloudPlatform/localllm?: It wraps llama-cpp-python and Hugging Face downloads into a small CLI so you can serve quantized models directly on a Cloud Workstation without calling remote APIs.
Is localllm open source?: Yes — GoogleCloudPlatform/localllm is open source, released under the Apache-2.0 license.
What language is localllm written in?: GoogleCloudPlatform/localllm is primarily written in Python.
How popular is localllm?: GoogleCloudPlatform/localllm has 1.5k stars on GitHub.
Where can I find localllm?: GoogleCloudPlatform/localllm is on GitHub at https://github.com/GoogleCloudPlatform/localllm.