Is FlashVSR open source?

Yes — OpenImagingLab/FlashVSR is open source, released under the Apache-2.0 license.

What language is FlashVSR written in?

OpenImagingLab/FlashVSR is primarily written in Python.

How popular is FlashVSR?

OpenImagingLab/FlashVSR has 1.7k stars on GitHub.

Where can I find FlashVSR?

OpenImagingLab/FlashVSR is on GitHub at https://github.com/OpenImagingLab/FlashVSR.

← all repositories

OpenImagingLab/FlashVSR

Diffusion video upscaling that finally stops for breath

A one-step diffusion model that streams 768×1408 video at ~17 FPS on a single A100 by saying no to redundant attention.

★1.7k stars Python Computer Vision Image · Video · Audio Inference · Serving

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does FlashVSR upscales low-resolution video in real time using a distilled one-step diffusion pipeline. It targets 4× super-resolution and processes frames as a stream rather than chewing through entire clips offline. The authors claim ~17 FPS at 768×1408 on one A100, with roughly 12× speedup over prior one-step diffusion VSR approaches.

The interesting bit The speed comes from three deliberate constraints: a three-stage distillation that collapses the usual multi-step diffusion into a single pass, locality-constrained sparse attention that only attends where it matters, and a stripped-down conditional decoder. The sparse attention is particularly notable—third-party ComfyUI ports that omit it and fall back to dense attention visibly degrade quality, which the authors document with side-by-side examples.

Key highlights

One-step streaming diffusion for video SR, not the usual offline batch processing
Locality-constrained sparse attention cuts compute and bridges train/test resolution gaps
Tiny conditional decoder keeps reconstruction fast without (they claim) sacrificing quality
VSR-120K dataset: 120k videos + 180k images for joint training, though not yet released
v1.1 weights available on Hugging Face with “enhanced stability + fidelity”
Active third-party ecosystem: multiple ComfyUI nodes, cloud APIs, though quality varies

Caveats

Block-sparse attention compilation is memory-hungry and officially tested only on A100/A800; H200 works but with limited acceleration, and RTX 40/50 series compatibility is unknown
The project is “primarily designed and optimized for 4× SR”; other scales are second-class citizens
VSR-120K dataset is still unreleased, so replication of training from scratch is currently impossible
Third-party implementations (including some popular ComfyUI nodes) have shipped without the sparse attention module, producing visibly worse results

Verdict Worth a look if you need diffusion-level video upscaling without the usual multi-minute-per-frame tax. Skip it if you’re on consumer hardware or need flexible scaling factors—the A100 requirement and 4× lock-in are real constraints, not suggestions.

Frequently asked

What is OpenImagingLab/FlashVSR?: A one-step diffusion model that streams 768×1408 video at ~17 FPS on a single A100 by saying no to redundant attention.
Is FlashVSR open source?: Yes — OpenImagingLab/FlashVSR is open source, released under the Apache-2.0 license.
What language is FlashVSR written in?: OpenImagingLab/FlashVSR is primarily written in Python.
How popular is FlashVSR?: OpenImagingLab/FlashVSR has 1.7k stars on GitHub.
Where can I find FlashVSR?: OpenImagingLab/FlashVSR is on GitHub at https://github.com/OpenImagingLab/FlashVSR.