LLMs that finally learned to draw — in Python
A project that turns text prompts into Manim animation code, with a clever training pipeline that uses the renderer itself as a reward signal.

What it does
Generative Manim is a toolkit that feeds your text prompt to an LLM and gets back Python code for Manim, the math-animation engine. Run that code, and you have a video. The project wraps this in a demo web app, an API, and a cloud deployment guide. It supports a small zoo of models — OpenAI’s GPT-4o and GPT-5.5, several Claude variants, Google’s Gemini, and a growing set of open-weight models via Featherless.
The interesting bit
The real meat is the training pipeline for open-source models. They distill from GPT-4o through supervised fine-tuning, then DPO on render success/failure pairs, then GRPO — reinforcement learning where the Manim renderer itself acts as a deterministic reward signal. Code either runs or crashes; no need for a separate reward model. It’s the same trick DeepSeek-R1 uses with math answer checkers, applied to animation code.
Key highlights
- 12+ model backends, from GPT-4o to Qwen 2.5 Coder, with a unified interface
- 3-stage open-source training: SFT → DPO → GRPO, using QLoRA to fit on free Kaggle T4 GPUs
- Executable benchmark suite with render-based scoring, pass@k evaluation, and JSONL reports
- Includes a command-injection fix in ffmpeg export (credit to a contributor)
- Active Discord community and multi-language docs
Caveats
- The open-source models (Qwen, DeepSeek Coder, CodeLlama) are marked 🚧 — work in progress
- The README calls this a “concept” and “prototype”; production polish is unclear
- Benchmark is described as an “MVP”; maturity level is explicitly early
Verdict
Worth a look if you’re building LLM-to-code pipelines or need programmatic video generation without touching After Effects. Skip it if you want a polished, end-to-end consumer tool — this is still research-grade infrastructure with sharp edges.