Ghost-text completions without the Copilot tax
A Neovim plugin that brings LLM-powered code suggestions to any model you can host or API you can reach.

What it does llm.nvim drops “ghost text” code completions into Neovim, Copilot-style, but lets you pick the brain behind the curtain. It talks to a local Rust binary, llm-ls, which handles the messy work of token counting, prompt sizing, and HTTP wrangling so the Lua frontend stays simple.
The interesting bit The plugin doesn’t assume one backend fits all. It ships with adapters for Hugging Face’s Inference API, Ollama, OpenAI-compatible endpoints, and Hugging Face’s own TGI — and it will try to auto-complete the URL path if you hand it a bare host. The tokenizer integration is the quietly important part: it uses Hugging Face’s tokenizers library to squeeze prompts into whatever context window you’ve configured, rather than guessing with character counts.
Key highlights
- Supports fill-in-the-middle (FIM) for models like StarCoder and CodeLlama with configurable special tokens
- Tokenizer can load from a local file, a Hugging Face repo, or a custom HTTP endpoint — or fall back to naive character counting
- Auto-suggest can be toggled per filetype or path pattern, or disabled at startup and triggered manually
- llm-ls binary auto-installs on first run, or can be pointed at a Mason install or custom build
- API token resolution has sensible precedence: explicit config, env var, HF_HOME file, or huggingface-cli login
Caveats
- The README warns that free-tier Hugging Face Inference API users will hit rate limits; the fix is a paid plan
- One copy-paste error in the docs: the OpenAI backend section accidentally tells you to “Refer to Ollama’s documentation”
Verdict Worth a look if you want Copilot-like completions but need to run local models, switch backends, or just avoid another subscription. Skip it if you want plug-and-play polish — this is a tinkerer’s setup.