AIDC-AI/Pixelle-Video

One-line prompt to TikTok: the assembly line for AI slop

Pixelle-Video wires LLMs, image/video generators, TTS engines, and ffmpeg into a single Streamlit app that spits out short-form videos from a topic string.

★21.7k stars Python Image · Video · Audio Creative · Design

View on GitHub ↗ Homepage ↗

Velocity · 7d

+102

★ / day

Trend

→steady

star history

What it does Pixelle-Video is a Python-based pipeline that automates the entire short-video workflow: write a script, generate matching AI images or video clips, synthesize voiceover, add background music, and stitch it all together. It exposes everything through a Streamlit web UI and targets zero-editing-skill users. The project also bundles a Windows one-click installer so you can skip Python/uv/ffmpeg setup entirely.

The interesting bit The modular backend lets you swap components like Lego bricks: LLM (GPT, Qwen, DeepSeek, Ollama), image/video generator (ComfyUI locally, RunningHub cloud, or direct APIs like Kling/Seedream/WAN 2.1), and TTS (Edge-TTS, Index-TTS, etc.). That flexibility is unusual for a tool pitched at non-technical creators. Recent additions include motion transfer and digital-human avatars, pushing it beyond simple image slideshows.

Key highlights

Supports both local ComfyUI workflows and direct API calls to Chinese and Western model providers
Handles portrait and landscape formats with template-based visual styling
Windows standalone package with start.bat; macOS/Linux need manual uv/ffmpeg install
Motion transfer and image-to-video pipelines added in early 2026
Voice cloning and multi-language TTS voices available

Caveats

The README is overwhelmingly in Chinese; English documentation exists but is clearly secondary
Heavy reliance on external API keys and services (LLM, image, video, TTS) means ongoing costs and potential rate-limit headaches
“Zero threshold” claim assumes you already have or are willing to buy API access for multiple services

Verdict Worth a look if you run a content farm, want to prototype short videos fast, or need a ComfyUI frontend for non-technical teammates. Skip it if you care about editorial control, fine-grained timing, or avoiding the homogenized look of template-driven AI video.