YouTube Shorts factory: AI writes, dubs, and edits the whole pipeline
A Python framework that automates the entire short-form video workflow from script to rendered upload, including voice synthesis and footage sourcing.

What it does ShortGPT is a Python framework that automates short-form video creation end-to-end. It generates scripts with LLMs, synthesizes voiceovers in 30+ languages via EdgeTTS or ElevenLabs, sources background footage from Pexels and Bing Images, auto-generates captions, and renders the final cut with MoviePy. It also handles longer videos and can dub/translate existing content into new languages. The whole thing ships as a Dockerized Gradio web app or a Google Colab notebook.
The interesting bit The framework exposes an “Editing Markup Language” — a JSON-based editing DSL designed to be readable and generatable by LLMs. This turns video editing from a GUI slog into a structured text problem that language models can actually reason about and produce.
Key highlights
- Three specialized engines:
ContentShortEngine(shorts with metadata),ContentVideoEngine(longer-form), andContentTranslationEngine(dub + translate full videos) - Voice support spans 30+ languages through Microsoft’s free EdgeTTS, with ElevenLabs as a premium alternative
- Asset sourcing is fully automated via Pexels API and Bing Image search
- Persistent state handled by TinyDB — no external database required
- Local Docker setup or zero-install Google Colab option
Caveats
- Documentation is explicitly noted as incomplete (“More documentation incomming, please be patient”)
- Requires Docker for local runs; the README defers to a separate
installation-notes.mdfor details - Heavy reliance on external APIs (OpenAI, ElevenLabs, Pexels) means ongoing costs and potential rate-limit friction
Verdict Worth a look if you’re running a faceless YouTube or TikTok channel and want to automate the content mill. Skip it if you need fine-grained creative control — this is assembly-line automation, not an artist’s workstation.