← all repositories

test-time-training/ttt-video-dit

A text-to-video generation model based on CogVideoX that extends video context to 63 seconds using Test-Time Training layers.

ttt-video-dit
Velocity · 7d
+5.7
★ / day
Trend
steady
star history

This repository provides training and inference code for a diffusion transformer that generates videos up to 63 seconds long. The architecture adapts the CogVideoX 5B model by incorporating Test-Time Training layers to handle long-range relationships across the global context while retaining original attention layers for local 3-second segment processing. The model is fine-tuned in stages at increasing video lengths (3s, 9s, 18s, 30s, 63s) for context extension.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.