Is lingbot-world open source?

Yes — Robbyant/lingbot-world is open source, released under the Apache-2.0 license.

What language is lingbot-world written in?

Robbyant/lingbot-world is primarily written in Python.

How popular is lingbot-world?

Robbyant/lingbot-world has 4.3k stars on GitHub and is currently cooling off.

Where can I find lingbot-world?

Robbyant/lingbot-world is on GitHub at https://github.com/Robbyant/lingbot-world.

← all repositories

Robbyant/lingbot-world

An open-source world model that runs fast enough to play

LingBot-World turns a single image and a text prompt into an interactive, minute-long simulated world at 16 FPS with under one second of latency.

★4.3k stars Python Image · Video · Audio Domain Apps

View on GitHub ↗ Homepage ↗

Velocity · 7d

+8.9

★ / day

Trend

↘cooling

star history

What it does

LingBot-World is an image-to-video world simulator built on top of Wan2.2. Feed it a still image, a text description, and optionally camera poses or action strings, and it generates extended video sequences while maintaining visual consistency across hundreds of frames. The project ships three model variants: a base camera-pose version, an action-controlled version with simplified keyboard-style commands, and a fast variant that uses chunked causal inference with KV caching for real-time interaction.

The interesting bit

The “Fast” model is the unusual part. Instead of generating all frames at once, it processes video chunk-by-chunk with KV caching, which is what lets it hit sub-second latency at 16 FPS — actually usable for interactive applications rather than batch rendering. The project also explicitly targets the open-source vs. closed-source gap, which is a nice change from the usual researchware that stops at the paper.

Key highlights

Three model tiers: Base (Cam), Base (Act) with keyboard-style action strings, and Fast for real-time use
Supports 480P and 720P output; up to ~961 frames (about a minute at 16 FPS) on sufficient GPU memory
Control via camera poses (OpenCV format), action strings like w-10,a-10,d-10, or no control signals at all
Community-provided 4-bit quantized model available for inference on limited VRAM
Apache 2.0 licensed with weights on HuggingFace and ModelScope

Caveats

Requires multi-GPU setup for the reference configurations (8 GPUs in the examples); single-GPU users will need the quantized model or significant patience
Built on Wan2.2, so you’re inheriting whatever installation friction that brings — flash-attention compilation, torch >= 2.4.0, etc.
The 4-bit quantized model explicitly warns of “minor degradation in visual fidelity and temporal consistency”

Verdict

Worth a look if you’re building interactive world simulators, game environments, or robot learning visualizers and need something actually open-weights. Skip it if you’re hoping for a lightweight single-GPU toy — this is still very much a workstation-or-cloud project.

Frequently asked

What is Robbyant/lingbot-world?: LingBot-World turns a single image and a text prompt into an interactive, minute-long simulated world at 16 FPS with under one second of latency.
Is lingbot-world open source?: Yes — Robbyant/lingbot-world is open source, released under the Apache-2.0 license.
What language is lingbot-world written in?: Robbyant/lingbot-world is primarily written in Python.
How popular is lingbot-world?: Robbyant/lingbot-world has 4.3k stars on GitHub and is currently cooling off.
Where can I find lingbot-world?: Robbyant/lingbot-world is on GitHub at https://github.com/Robbyant/lingbot-world.