Phantom-video/Phantom
A text-to-video generation model that maintains subject consistency across generated videos using cross-modal alignment.

Phantom is a video generation model developed by ByteDance that produces videos from text prompts while preserving subject consistency across frames. It employs cross-modal alignment techniques to ensure visual coherence between text descriptions and generated video content. The project includes model weights, training code, and inference tools, along with a companion dataset (Phantom-Data) for improving subject consistency in generative video models.