← all repositories
Lightricks/LTX-2

A 22B-parameter video studio that fits in your GPU (barely)

Lightricks open-sources the full inference stack and LoRA trainer for their DiT-based audio-video model, complete with camera-control LoRAs and HDR output pipelines.

LTX-2
Velocity · 7d
+46
★ / day
Trend
steady
star history

What it does LTX-2 is a 22-billion-parameter diffusion transformer that generates synchronized audio and video from text, images, or existing clips. This repository provides the official Python inference code and LoRA training tools, plus ten pre-built pipelines ranging from quick single-stage prototyping to two-stage production-quality generation with spatial upsampling.

The interesting bit The model ships with a small arsenal of task-specific LoRAs—camera dolly left/right, jib up/down, pose control, motion tracking, even lip-dubbing with speaker identity matching. There’s also an HDR pipeline that outputs linear float frames via LogC3 inverse decode, meant for EXR export and professional tonemapping. Lightricks essentially bundled a post house’s worth of tooling into one checkpoint.

Key highlights

  • Ten pipelines: text/image-to-video, audio-to-video, keyframe interpolation, video retakes, HDR output, lip dubbing
  • FP8 quantization support (fp8-cast for bf16 checkpoints, fp8-scaled-mm for TensorRT-LLM on Hopper)
  • FlashAttention 4 for datacenter Blackwell (B200), xFormers for other CUDA GPUs
  • Distilled variant runs inference in 8+4 steps; gradient estimation can cut steps from 40 to 20-30
  • Automatic prompt enhancement built into all pipelines
  • ComfyUI integration available via separate repo

Caveats

  • Requires multiple large safetensors downloads (22B base model, spatial upscaler, distilled LoRA, Gemma 3 text encoder)
  • Temporal upscaler is “supported by the model” but not yet required by any pipeline—future work
  • FlashAttention 4 install is pinned to a specific beta (4.0.0b9) against torch 2.9.1+cu128; newer betas break on consumer Blackwell

Verdict Worth a look if you’re doing local video generation research or need fine-grained camera/pose control without building pipelines from scratch. Skip it if you’re hoping for a lightweight, single-file demo—this is a production-weight stack with production-weight hardware requirements.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.