Is CogVideo open source?

Yes — zai-org/CogVideo is open source, released under the Apache-2.0 license.

What language is CogVideo written in?

zai-org/CogVideo is primarily written in Python.

How popular is CogVideo?

zai-org/CogVideo has 12.9k stars on GitHub.

Where can I find CogVideo?

zai-org/CogVideo is on GitHub at https://github.com/zai-org/CogVideo.

← all repositories

zai-org/CogVideo

Text-to-video models that won't melt your desktop GPU

CogVideoX is Zhipu AI's open-source video generation suite, offering text-to-video, image-to-video, and video continuation models small enough to run on consumer hardware.

★12.9k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

CogVideoX is an open-source video generation family from Zhipu AI. It turns text prompts or still images into short video clips, and can also continue existing footage. The project spans several model sizes—from the lightweight CogVideoX-2B up to the newer CogVideoX1.5-5B series—alongside the original research-grade CogVideo model from ICLR 2023.

The interesting bit

The project explicitly targets modest hardware: the 2B model reportedly runs on a GTX 1080TI, while the 5B variant targets an RTX 3060. It also ships with a custom 3D Causal VAE that the authors claim reconstructs video “with almost no loss,” and supports quantized inference via Diffusers and TorchAO.

Key highlights

Three generation modes: text-to-video, image-to-video (CogVideoX-5B-I2V), and video continuation.
Fine-tuning ecosystem includes LoRA support, DDIM inversion, and the separate CogKit framework for deeper customization.
Frame counts must follow strict arithmetic (8N+1 or 16N+1), and resolutions are capped at 720×480 for older models or up to 1360×768 for the 1.5 series.
The 2B model carries an Apache 2.0 license; the README does not clearly state terms for larger weights.

Caveats

Prompt engineering is essentially mandatory: the README states you should first rewrite prompts using a large language model like GLM-4 or GPT-4, as the video model was trained on long, optimized captions.
Hardware requirements are low for inference, but the README warns to strictly follow requirements.txt and limits Python to 3.10–3.12.

Verdict

A solid choice for researchers and hobbyists who want to tinker with open video generation on consumer GPUs. Less appealing if you expect turnkey results without first chaining another LLM to rewrite your prompts.

Frequently asked

What is zai-org/CogVideo?: CogVideoX is Zhipu AI's open-source video generation suite, offering text-to-video, image-to-video, and video continuation models small enough to run on consumer hardware.
Is CogVideo open source?: Yes — zai-org/CogVideo is open source, released under the Apache-2.0 license.
What language is CogVideo written in?: zai-org/CogVideo is primarily written in Python.
How popular is CogVideo?: zai-org/CogVideo has 12.9k stars on GitHub.
Where can I find CogVideo?: zai-org/CogVideo is on GitHub at https://github.com/zai-org/CogVideo.