Is JoyAI-Image open source?

Yes — jd-opensource/JoyAI-Image is open source, released under the Apache-2.0 license.

What language is JoyAI-Image written in?

jd-opensource/JoyAI-Image is primarily written in Python.

How popular is JoyAI-Image?

jd-opensource/JoyAI-Image has 2.2k stars on GitHub.

Where can I find JoyAI-Image?

jd-opensource/JoyAI-Image is on GitHub at https://github.com/jd-opensource/JoyAI-Image.

← all repositories

jd-opensource/JoyAI-Image

One model that looks, draws, and edits—without forgetting where things go

JD's open-source JoyAI-Image fuses an 8B MLLM with a 16B diffusion transformer so understanding and generation can actually talk to each other.

★2.2k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does JoyAI-Image is a unified foundation model that handles three tasks—image understanding, text-to-image generation, and instruction-guided editing—through a single architecture. An 8B multimodal LLM and a 16B diffusion transformer share an interface, so the same model can describe a scene, render text-heavy layouts, or move objects around while keeping the background intact.

The interesting bit The project bets on a feedback loop: better spatial understanding improves generation and editing, while generative tasks like novel-view synthesis feed sharper visual evidence back into reasoning. It is the rare “unified” model that ships actual weights for more than one task, with Diffusers and ComfyUI integrations already merged.

Key highlights

Released weights for understanding (JoyAI-Image-Und) and editing (JoyAI-Image-Edit); text-to-image and distilled variants are marked “to be released”
Emphasizes spatial reasoning—camera control, object rotation, location-specific edits, multi-view consistency
Claims strong long-text rendering: comics, dense multilingual layouts, handwritten styles
Training data pipeline is open-sourced as OpenSpatial-3M and SpatialEdit datasets
Diffusers PR merged upstream; ComfyUI nodes available; Hugging Face and ModelScope demos live

Caveats

Core text-to-image and distilled editing weights are not yet released
Requires CUDA, Python ≥3.10, and flash-attn ≥2.8.0; not a lightweight CPU toy
README is enthusiastic but thin on quantitative benchmarks or training compute details

Verdict Worth watching if you need controllable image editing with spatial awareness, or if you are tired of stitching together separate captioning, generation, and inpainting pipelines. Skip it for now if you need a fully released, end-to-end text-to-image model you can ship today.

Frequently asked

What is jd-opensource/JoyAI-Image?: JD's open-source JoyAI-Image fuses an 8B MLLM with a 16B diffusion transformer so understanding and generation can actually talk to each other.
Is JoyAI-Image open source?: Yes — jd-opensource/JoyAI-Image is open source, released under the Apache-2.0 license.
What language is JoyAI-Image written in?: jd-opensource/JoyAI-Image is primarily written in Python.
How popular is JoyAI-Image?: jd-opensource/JoyAI-Image has 2.2k stars on GitHub.
Where can I find JoyAI-Image?: jd-opensource/JoyAI-Image is on GitHub at https://github.com/jd-opensource/JoyAI-Image.