genmoai/mochi
An open-source state-of-the-art video generation model that converts text prompts into high-fidelity motion videos using a diffusion-based architecture with VAE.

Mochi 1 preview is a foundation model for text-to-video generation released by Genmo under an Apache 2.0 license. The model employs a diffusion architecture combined with a VAE (variational autoencoder) for video synthesis, achieving high-fidelity motion and strong prompt adherence. It supports consumer GPU inference, integration with ComfyUI, and LoRA fine-tuning for customization. Users can interact via a Gradio UI or programmatically via inference scripts.