← all repositories
huggingface/llm-vscode

VS Code extension that swaps Copilot for your own LLM backend

A VS Code extension that brings ghost-text code completion with any model you can point an HTTP request at—Hugging Face, Ollama, or your own endpoint.

1.3k stars TypeScript Coding Assistants
llm-vscode
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

What it does

llm-vscode is a VS Code extension that provides inline “ghost-text” code completion, similar to GitHub Copilot. It sends your code context to an LLM backend via HTTP and surfaces the suggestion inline. The heavy lifting is done by a bundled Rust binary, llm-ls, which handles tokenization and prompt sizing so the context window isn’t overflowed.

The interesting bit

The standout feature is code attribution: hit Cmd+Shift+A and it runs a rapid Bloom-filter check against The Stack dataset to see if the generated code already exists somewhere. It’s a first-pass filter with acknowledged false positives, but it’s more than most completion tools offer. The tokenizer integration is also notably thorough—you can pull it from a local file, a Hugging Face repo, or an arbitrary HTTP endpoint.

Key highlights

  • Supports multiple backends: Hugging Face Inference API, Ollama, OpenAI-compatible APIs, and Hugging Face’s own Text Generation Inference
  • Prompts are automatically resized to fit within the model’s context window using Hugging Face tokenizers
  • Built-in code attribution check against The Stack dataset via Bloom filter
  • Document filters let you restrict suggestions to specific file patterns
  • Reads your existing huggingface-cli token from disk if available

Caveats

  • The free tier of Hugging Face Inference API will rate-limit you; the README nudges toward the Pro plan
  • Non-Hugging Face backends require you to configure your own URL or it will error
  • The attribution check is explicitly a “rapid first-pass” with possible false positives, not definitive proof

Verdict

Worth a look if you want Copilot-style completion but need to run your own model or switch between backends. Skip it if you want something that just works out of the box without API tokens and backend configuration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.