A video localization factory that speaks agent
Go-based pipeline turns YouTube links into translated, dubbed, platform-formatted videos — and exposes every stage as CLI skills for AI agents to orchestrate.

What it does
KrillinAI runs the full drudge-work chain of video localization: download via yt-dlp, Whisper-based transcription, LLM translation with terminology replacement, TTS dubbing (CosyVoice or OpenAI), and reformatting for landscape or portrait output. It targets the specific platform geometries of Bilibili, Douyin, TikTok, YouTube, and others. Human users get a desktop or web UI; machines get a staged CLI and a skills/ directory of stable contracts.
The interesting bit
The project treats “AI Agent” as a first-class user, not a buzzword. Each pipeline stage emits structured artifacts and a krillinai_manifest.json manifest so subsequent stages can resume without re-running transcription. The CLI outputs a single JSON line to stdout on completion — built for being shelled out to, not merely tolerated.
Key highlights
- Staged CLI commands:
subtitle,tts,render-horizontal,render-vertical,pipeline,cover - Multiple Whisper backends: OpenAI cloud, FasterWhisper (local), WhisperKit (Apple Silicon), WhisperCpp, plus Alibaba Cloud ASR for mainland China
- LLM-agnostic: any OpenAI API-compatible endpoint, including local deployments
- Desktop app exists but README notes it “has some bugs that are continuously being updated”; server/web UI is the stable path
- macOS requires manual quarantine stripping and chmod for both versions — unsigned binaries
Caveats
- Desktop version explicitly flagged as newer and buggier; server/web deployment is the conservative choice
- macOS users must run
xattrandchmodcommands before either version will launch - TTS options are limited to Alibaba Cloud Voice Service and OpenAI TTS; no local TTS engine listed
Verdict Worth a look if you run content localization at volume or want to wire video translation into an automated workflow. Skip it if you need a polished one-click consumer app — the rough edges are documented, not hidden.