← all repositories
bytedance/Lance

ByteDance's 3B-parameter do-it-all visual model

A single small model that generates, edits, and understands images and video—no separate pipelines required.

Lance
Velocity · 7d
+48
★ / day
Trend
steady
star history

What it does

Lance is a 3B-active-parameter unified multimodal model from ByteDance that handles image and video understanding, generation, and editing in one framework. You can prompt it for text-to-image, text-to-video, image-to-video, plus editing tasks for both modalities, or ask it to describe what it sees.

The interesting bit

The “native unified” claim is the hook: instead of gluing together a diffusion model, an LLM, and an editor, Lance is trained from scratch on all these tasks simultaneously using a staged multi-task recipe. The authors explicitly call it a “research artifact”—they trained on up to 128 A100s, capped at 768×768 images and 480p/12 FPS video, and want the community to stress-test whether this synergy actually works at small scale.

Key highlights

  • 3B active parameters, competitive on image generation, editing, and video generation benchmarks (per the README’s claim; no specific numbers shown)
  • Supports 7 task types: t2i, t2v, i2v, image_edit, video_edit, x2t_image, x2t_video
  • Now runs in vLLM-Omni for faster inference; Gradio demo and HuggingFace Space available
  • Requires 40GB+ VRAM for inference (A100 territory)
  • Fine-tuning code not yet released

Caveats

  • Output quality “may vary across prompts, resolutions, duration, motion complexity, and editing scenarios”—the authors’ own warning
  • Flash Attention compilation can be finicky; README points to third-party wheels “for reference only”
  • Trained up to 480p video; don’t expect cinema-grade generation

Verdict

Worth a spin if you’re researching unified multimodal architectures or need a single model that covers multiple visual tasks without model-swapping. Skip it if you need production polish, higher resolutions, or GPU budgets under 40GB.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.