Claude Code, meet your new video editor: ffmpeg and a 12KB transcript
An open-source skill that lets coding agents edit raw footage into finished videos by reading transcripts instead of watching frames.

What it does
video-use is a skill you register with Claude Code, Codex, or similar agents. Drop raw footage in a folder, tell the agent “edit these into a launch video,” and it returns final.mp4 with filler words cut, color graded, subtitled, and optionally peppered with animation overlays. It targets talking heads, tutorials, interviews — anything with speech — and keeps all outputs in your project folder, leaving the skill directory untouched.
The interesting bit
The LLM never “watches” the video. It reads a ~12KB packed transcript with word-level timestamps and speaker labels, then requests on-demand “timeline view” PNGs — filmstrips with waveforms — only at ambiguous decision points. The README contrasts this with the naive approach: 30,000 frames at 1,500 tokens each would be 45 million tokens of noise. Instead, the agent reasons over text and asks for visuals sparingly, like browser-use giving an LLM a DOM instead of a screenshot.
Key highlights
- Self-evaluation loop: renders output, checks every cut boundary for pops and jumps, re-renders up to 3 times before showing you anything
- Parallel sub-agents spawn for animation overlays via HyperFrames, Remotion, Manim, or PIL
- Session memory persists in
project.mdso you can resume editing next week - 30ms audio fades at every cut, auto color grading, customizable subtitle burning
- Requires ElevenLabs Scribe for transcription and ffmpeg for rendering; yt-dlp optional for online sources
Caveats
- ElevenLabs API key is mandatory for the transcription layer; no fallback mentioned
- README is bullish on the approach but offers no benchmarks comparing output quality or cost against human editors or traditional tools
- “Works for any content” is claimed, though the design principles explicitly prioritize audio-driven cuts — purely visual content (music videos, B-roll montages) seems less suited
Verdict
Worth a look if you’re already living in Claude Code and want to automate the tedious first pass of interview or tutorial editing. Traditional video editors and anyone needing frame-precise visual storytelling should probably keep their NLE.