← all repositories
SeungyounShin/Llama2-Code-Interpreter

Llama2 learns to run its own code, debug, and try again

A fine-tuned CodeLlama that generates Python, executes it, reads the output, and iterates—packaged as a Gradio chatbot.

Llama2-Code-Interpreter
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does This project wraps a fine-tuned CodeLlama-7B model in a chat interface where the LLM can write Python, execute it locally, see the results (or errors), and continue the conversation with that context. It also persists variables across code blocks, so earlier calculations stay available. The README shows a demo of plotting Nvidia stock prices via Yahoo Finance and Matplotlib.

The interesting bit The author didn’t just prompt-engineer this—they fine-tuned CodeLlama-7B-Instruct on execution feedback data, pushing HumanEval pass@1 from 34.8% to 70.12%. That’s a substantial jump for a 7B parameter model, suggesting the model actually learned something about code correctness, not just syntax.

Key highlights

  • Fine-tuned 7B model available on HuggingFace (Seungyoun/codellama-7b-instruct-pad)
  • Automatic code extraction, execution, and error feedback loop
  • Variable state persists across multiple code blocks in a session
  • Gradio UI with one-liner launch (python3 chatbot.py --path ...)
  • Also supports base Llama-2 chat models via --model_path

Caveats

  • The “access to internet” claim in the repo description appears to mean the generated code can call web APIs (e.g., Yahoo Finance), not that the model itself browses; the README doesn’t clarify sandboxing or security boundaries for arbitrary code execution
  • GSM8K math benchmark score (28%) lags behind larger Code Llama variants, so math reasoning remains a weak spot

Verdict Worth a spin if you want a local, open-source alternative to GPT-4’s code interpreter and can tolerate running un-sandboxed generated code. Skip if you need bulletproof security or state-of-the-art math reasoning.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.