Is Bernini open source?

Yes — bytedance/Bernini is open source, released under the Apache-2.0 license.

What language is Bernini written in?

bytedance/Bernini is primarily written in Python.

How popular is Bernini?

bytedance/Bernini has 1.2k stars on GitHub and is currently holding steady.

Where can I find Bernini?

bytedance/Bernini is on GitHub at https://github.com/bytedance/Bernini.

← all repositories

bytedance/Bernini

ByteDance's video editor thinks before it renders

A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.

★1.2k stars Python Image · Video · Audio Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+7.1

★ / day

Trend

→steady

star history

What it does

Bernini is a unified video generation and editing system from ByteDance. It splits the work into two stages: an MLLM-based “semantic planner” figures out what should happen in the video, and a DiT-based “renderer” (Bernini-R) actually generates the pixels. The open-sourced piece is the renderer, which handles text-to-image, image editing, text-to-video, video editing, and reference-guided video tasks through a shared pipeline.

The interesting bit

The planner-renderer split is the architectural bet. The planner works in latent semantic space — reasoning about motion, composition, and edits before any expensive diffusion steps — while the renderer inherits from Wan2.2-T2V-A14B and adds trained high-noise/low-noise transformer weights. For video editing specifically, the authors claim first-tier results against closed-source commercial models on a self-built human evaluation arena.

Key highlights

Supports six task types through one renderer: t2i, i2i, t2v, v2v, mv2v, rv2v, r2v
Two weight loading modes: a self-contained diffusers-format bundle (recommended), or separate Wan2.2 base + Bernini-R checkpoints
Multi-GPU inference via Ulysses sequence parallel (8-way in examples); single-GPU fallback for image tasks
Optional GPT-based prompt enhancer via OpenAI-compatible API
Gradio demo included; runs at 480p/16fps by default, with examples up to 720p/24fps

Caveats

The semantic planner itself is not open-sourced — only the renderer weights and inference code are available
Hardware expectations are steep: Hopper GPUs recommended for FlashAttention-3; CUDA 12.4 and Python 3.11.2 are essentially pinned
The “first tier” video editing claim comes from the authors’ own arena platform, not an independent benchmark

Verdict

Worth a look if you’re doing research in structured video editing or building on Wan2.2 and want a pretrained renderer with broad task coverage. Skip it if you need the full planner-renderer system end-to-end, or if your hardware tops out at an A100 and you were hoping for the fastest path.

Frequently asked

What is bytedance/Bernini?: A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.
Is Bernini open source?: Yes — bytedance/Bernini is open source, released under the Apache-2.0 license.
What language is Bernini written in?: bytedance/Bernini is primarily written in Python.
How popular is Bernini?: bytedance/Bernini has 1.2k stars on GitHub and is currently holding steady.
Where can I find Bernini?: bytedance/Bernini is on GitHub at https://github.com/bytedance/Bernini.