VS Code extension that swaps Copilot for your own LLM backend
A VS Code extension that brings ghost-text code completion with any model you can point an HTTP request at—Hugging Face, Ollama, or your own endpoint.

What it does
llm-vscode is a VS Code extension that provides inline “ghost-text” code completion, similar to GitHub Copilot. It sends your code context to an LLM backend via HTTP and surfaces the suggestion inline. The heavy lifting is done by a bundled Rust binary, llm-ls, which handles tokenization and prompt sizing so the context window isn’t overflowed.
The interesting bit
The standout feature is code attribution: hit Cmd+Shift+A and it runs a rapid Bloom-filter check against The Stack dataset to see if the generated code already exists somewhere. It’s a first-pass filter with acknowledged false positives, but it’s more than most completion tools offer. The tokenizer integration is also notably thorough—you can pull it from a local file, a Hugging Face repo, or an arbitrary HTTP endpoint.
Key highlights
- Supports multiple backends: Hugging Face Inference API, Ollama, OpenAI-compatible APIs, and Hugging Face’s own Text Generation Inference
- Prompts are automatically resized to fit within the model’s context window using Hugging Face tokenizers
- Built-in code attribution check against The Stack dataset via Bloom filter
- Document filters let you restrict suggestions to specific file patterns
- Reads your existing
huggingface-clitoken from disk if available
Caveats
- The free tier of Hugging Face Inference API will rate-limit you; the README nudges toward the Pro plan
- Non-Hugging Face backends require you to configure your own URL or it will error
- The attribution check is explicitly a “rapid first-pass” with possible false positives, not definitive proof
Verdict
Worth a look if you want Copilot-style completion but need to run your own model or switch between backends. Skip it if you want something that just works out of the box without API tokens and backend configuration.