Bilibili meets storyboard: an AI sketch pipeline for Chinese social video
A React app that ingests Bilibili and Xiaohongshu links, lets you tag frames, and feeds them to Gemini for hand-drawn storyboards and viral copy.

What it does
ClipSketch AI is a browser-based video annotation tool built for Chinese social-media creators. Paste a Bilibili or Xiaohongshu link, scrub through with frame-level keyboard controls, hit T to tag moments, then hand the tagged frames to Google Gemini. The model returns a unified hand-drawn storyboard, three flavors of “grass-growing” (product-seeding) copy, and a vertical cover image. Everything runs in a React 19 + Tailwind frontend with IndexedDB for local state; a Docker image is provided.
The interesting bit
The project treats Gemini as a full creative department rather than a chatbot: one model call synthesizes multiple tagged frames into a coherent visual narrative, another generates platform-native copy in three distinct voices (emotional story, dry tutorial, punchy micro-format). The README also notes a batch-processing mode and custom-character fusion, suggesting the author has actually burned through API quota tuning the pipeline.
Key highlights
- Imports Bilibili and Xiaohongshu share links (including mixed-text shares) and proxies cross-origin video playback with
referrerPolicy="no-referrer" - Frame-accurate tagging with
Thotkey; exports TXT timelines or ZIPs of captured frames - Storyboard generation via
gemini-3-pro-image-preview; copy and cover viagemini-3-pro-preview - Responsive layout that flips to vertical stack on mobile
- Docker one-liner:
docker run -p 3000:3000 earisty/clipsketch-ai:latest
Caveats
- Requires a Google Cloud project with explicit access to
gemini-3-pro-image-preview; expect 403s if your key lacks that model scope - Video playback relies on proxy workarounds that may break if platform CDN policies shift
Verdict
Worth a spin if you regularly repurpose Chinese short-video content into illustrated threads or Xiaohongshu posts. Skip it if you need generic video editing or don’t have a Gemini API key with preview-model access.