Glue code that frees you from API keys
A tiny Flask shim lets you run Meta's Code Llama inside VS Code without signing up for anything.

What it does
llamacpp_mock_api.py is a single-file Flask server that impersonates llama.cpp’s API. The Continue VS Code extension thinks it’s talking to llama.cpp, but it’s actually talking to Meta’s official codellama inference code. You get local code completion with no API key, no cloud service, and no Ollama.
The interesting bit
The whole project is literally one Python file. The author admits it’s “glue” in the README, but it’s glue that solves a real platform gap: Ollama doesn’t support Windows or Linux, and Continue’s native llama.cpp provider expects a different interface than Meta’s torchrun setup. This shim bridges the two without touching either codebase.
Key highlights
- Single-file implementation (
llamacpp_mock_api.py) - Cross-platform wherever Meta’s
codellamaruns (explicitly: Windows and Linux, unlike Ollama) - Zero API keys or account signups required
- Works with Continue’s existing llama.cpp configuration—just swap the model name to
codellama-7b - Requires only Flask as an additional dependency
Caveats
- You must already have Meta’s
codellamarunning independently; this doesn’t bundle or simplify the model setup - The README’s “as of the time of writing” caveat suggests the landscape may have shifted since writing
Verdict
Worth 10 minutes if you’re already running Code Llama locally and want VS Code integration without Ollama’s platform limits. Skip it if you want a one-click installer or if you’re on macOS where Ollama works fine anyway.