Your coding agent becomes a 4K video factory
A SKILL.md workflow that turns a topic into platform-native video podcasts via Remotion, TTS, and a lot of API keys.

What it does
Video Podcast Maker is an agent-orchestrated pipeline that researches a topic, writes a script, generates narration through multiple TTS backends, and renders a 4K video in Remotion. It targets five platforms — Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels — with per-platform formatting, thumbnails, and call-to-action presets. The human’s job is to describe the topic and then mercilessly edit the resulting podcast.txt; the agent handles the rest.
The interesting bit
The README contains an unusually honest, all-caps warning to humans (not the AI) that the script is the single source of truth for every downstream step — “A weak script renders into 4K garbage.” It is refreshing documentation: it knows the bottleneck is prose quality, not rendering speed.
Key highlights
- Seven TTS backends supported, from free Edge TTS to ElevenLabs and CosyVoice, with bilingual Chinese/English mixing and per-project phoneme correction.
- Remotion-native 4K subtitles rendered in React/CSS; legacy FFmpeg burn-in still available.
- Platform-specific output rules baked in: Bilibili chapter timestamps, Xiaohongshu 3:4 thumbnails, Douyin 9:16 vertical-only shorts.
- Reusable React component library (ComparisonCard, DataBar, FlowChart, LottieAnimation, etc.) for composing section layouts.
- Manual
style_profilesinuser_prefs.jsoncarry palette and typography across videos; automatic preference learning is on the roadmap.
Caveats
- Requires a separate Remotion project as foundation; this repo is the workflow skill, not a standalone app.
- macOS and Linux only; Windows compatibility is not mentioned.
- Heavy API key sprawl: Azure, Volcengine, Aliyun, ElevenLabs, Google Cloud, OpenAI, Gemini — most are optional, but the matrix is wide.
- The authors note the project is “still under active development and may not be fully mature yet.”
Verdict
Worth a look if you are already publishing to Chinese and Western video platforms and want to automate the mechanical parts — rendering, subtitles, platform formatting — while keeping creative control over the script. Skip it if you expect a one-click SaaS; this is a local toolchain that expects you to curate API keys and edit text.