← all repositories
browser-use/video-use

Claude Code, meet your new video editor: ffmpeg and a 12KB transcript

An open-source skill that lets coding agents edit raw footage into finished videos by reading transcripts instead of watching frames.

video-use
Velocity · 7d
+162
★ / day
Trend
steady
star history

What it does

video-use is a skill you register with Claude Code, Codex, or similar agents. Drop raw footage in a folder, tell the agent “edit these into a launch video,” and it returns final.mp4 with filler words cut, color graded, subtitled, and optionally peppered with animation overlays. It targets talking heads, tutorials, interviews — anything with speech — and keeps all outputs in your project folder, leaving the skill directory untouched.

The interesting bit

The LLM never “watches” the video. It reads a ~12KB packed transcript with word-level timestamps and speaker labels, then requests on-demand “timeline view” PNGs — filmstrips with waveforms — only at ambiguous decision points. The README contrasts this with the naive approach: 30,000 frames at 1,500 tokens each would be 45 million tokens of noise. Instead, the agent reasons over text and asks for visuals sparingly, like browser-use giving an LLM a DOM instead of a screenshot.

Key highlights

  • Self-evaluation loop: renders output, checks every cut boundary for pops and jumps, re-renders up to 3 times before showing you anything
  • Parallel sub-agents spawn for animation overlays via HyperFrames, Remotion, Manim, or PIL
  • Session memory persists in project.md so you can resume editing next week
  • 30ms audio fades at every cut, auto color grading, customizable subtitle burning
  • Requires ElevenLabs Scribe for transcription and ffmpeg for rendering; yt-dlp optional for online sources

Caveats

  • ElevenLabs API key is mandatory for the transcription layer; no fallback mentioned
  • README is bullish on the approach but offers no benchmarks comparing output quality or cost against human editors or traditional tools
  • “Works for any content” is claimed, though the design principles explicitly prioritize audio-driven cuts — purely visual content (music videos, B-roll montages) seems less suited

Verdict

Worth a look if you’re already living in Claude Code and want to automate the tedious first pass of interview or tutorial editing. Traditional video editors and anyone needing frame-precise visual storytelling should probably keep their NLE.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.