Teaching LLaMA to speak Godot, not Python with a hat
A finetuning pipeline that uses real human GDScript code and GPT-3.5 only for labeling, producing models that reportedly beat GPT-4 on syntax accuracy for the niche game engine language.

What it does
godot-dodo scrapes MIT-licensed Godot repositories from GitHub, splits .gd files into functions, and uses gpt-3.5-turbo to auto-generate descriptive comments for each one. The resulting comment:code pairs finetune LLaMA variants (7B and 13B) specifically for GDScript generation. The repo includes the full dataset assembly scripts, pre-built 60k-row datasets, finetuning configs, and published model weights.
The interesting bit
The twist is what the LLM isn’t used for. Unlike Alpaca-style approaches that synthesize training outputs from larger models, this project uses human-written code exclusively for the code side, with GPT-3.5 only as a labeler. The README claims the resulting models achieve “significantly greater consistency” than GPT-4/GPT-3.5-turbo on GDScript syntax, with code-specific base models even outperforming on complex instructions.
Key highlights
- Full pipeline from GitHub scraping to published HuggingFace weights
- Pre-assembled 60k-row dataset for Godot 4.x projects; 3.x data noted as future work
- Training cost: $30 for dataset labeling, $24–$84 for finetuning runs on 8x A100 80GB GPUs
- Explicitly MIT-only code sourcing, with all scraped projects listed for attribution
- Colab inference notebook included for trying models without local GPU
Caveats
- The README identifies a real weakness: models learn to reference objects initialized outside function scope, producing code that assumes unimplemented context
- Hardware requirements are steep; the 13B model needed eight A100s and NVLink is recommended over PCIe
- Godot 3.x dataset and model variants are missing despite the methodology being language-agnostic
Verdict
Worth a look if you’re building Godot tooling or exploring domain-specific LLM finetuning on a budget. Skip if you need general-purpose code generation or lack access to serious GPU hardware.