← all repositories
taco-group/SparkVSR

Let users paint keyframes, let AI fill the rest

SparkVSR turns sparse, user-corrected frames into full video super-resolution without the black-box problem.

SparkVSR
Velocity · 7d
+8.1
★ / day
Trend
steady
star history

What it does

SparkVSR is a video super-resolution system built on CogVideoX1.5-5B-I2V that lets you manually fix up a few keyframes with any image SR model you like, then propagates those corrections across the entire video. It stays anchored to the original low-res motion, so you don’t get drifting hallucinations.

The interesting bit

The two-stage training is the actual machinery: Stage 1 fuses low-res video latents with sparse high-res keyframe latents in latent space for cross-frame propagation, then Stage 2 refines perceptual details back in pixel space. There’s also a reference-free guidance mechanism that gracefully degrades to blind SR when your keyframes are missing or garbage — no hard failure mode.

Key highlights

  • Built on CogVideoX1.5-5B-I2V; weights and ComfyUI node both released
  • Three inference modes: API-driven (nano-banana-pro), local PiSA-SR references, or pure blind SR fallback
  • Flexible keyframe selection: manual, codec I-frame extraction, or random sampling
  • Claims up to 24.6% CLIP-IQA, 21.8% DOVER, 5.6% MUSIQ improvement over baselines (per paper)
  • Generalizes out-of-the-box to old-film restoration and video style transfer

Caveats

  • Training demands 4×A100 GPUs; this is not a hobbyist setup
  • The README’s benchmark percentages come from the paper, not independently verifiable in the repo
  • Stage-1 checkpoint is explicitly not for inference; easy to grab the wrong weights

Verdict

Video restoration researchers and VFX pipelines that need human-in-the-loop control should look here. If you just want one-click upscaling, the complexity is overkill — use a simpler VSR model.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.