Is DALLE2-pytorch open source?

Yes — lucidrains/DALLE2-pytorch is open source, released under the MIT license.

What language is DALLE2-pytorch written in?

lucidrains/DALLE2-pytorch is primarily written in Python.

How popular is DALLE2-pytorch?

lucidrains/DALLE2-pytorch has 11.3k stars on GitHub.

Where can I find DALLE2-pytorch?

lucidrains/DALLE2-pytorch is on GitHub at https://github.com/lucidrains/DALLE2-pytorch.

← all repositories

lucidrains/DALLE2-pytorch

Open-sourcing DALL-E 2 before the hype moved to Imagen

A from-scratch PyTorch reconstruction of DALL-E 2 built for researchers who want to train the full CLIP-prior-decoder stack end-to-end.

★11.3k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repository implements OpenAI’s DALL-E 2 architecture in PyTorch, breaking the system into three trainable pieces: a CLIP encoder, a diffusion prior network, and a cascaded decoder. The diffusion prior is the main focus—it generates CLIP image embeddings from text embeddings, adding the “extra layer of indirection” that the paper introduces. Researchers have used the code to train functional priors and decoders, including an 800-GPU run and unconditional generation tests on Oxford flowers.

The interesting bit

Instead of generating pixels directly from text, the model first hallucinates an intermediate image embedding via a causal transformer inside the diffusion prior, then hands that embedding to a U-Net decoder. It is a deliberately indirect pipeline, and the author already concedes that the simpler Imagen architecture has since made this particular brand of complexity obsolete.

Key highlights

Implements the full three-stage pipeline: CLIP, diffusion prior, and cascaded decoder.
The diffusion prior uses a causal transformer as its denoising network, matching the best-performing variant from the paper.
External researchers have validated the decoder on Oxford flowers and scaled training to 800 GPUs without script changes.
Supports cascading DDPM with multiple U-Nets for progressive resolution upsampling.
Pre-trained prior checkpoints are available via LAION and Hugging Face, though decoder checkpoints remain works in progress.

Caveats

The author notes the project is no longer state-of-the-art as of May 2022, with future work shifting to the simpler Imagen architecture.
Decoder checkpoints are still marked in-progress, so end-to-end text-to-image generation relies on community-trained weights or your own compute budget.

Verdict

Grab this if you are a researcher studying diffusion priors or reproducing landmark generative models; skip it if you just want a polished API for generating cat photos.

Frequently asked

What is lucidrains/DALLE2-pytorch?: A from-scratch PyTorch reconstruction of DALL-E 2 built for researchers who want to train the full CLIP-prior-decoder stack end-to-end.
Is DALLE2-pytorch open source?: Yes — lucidrains/DALLE2-pytorch is open source, released under the MIT license.
What language is DALLE2-pytorch written in?: lucidrains/DALLE2-pytorch is primarily written in Python.
How popular is DALLE2-pytorch?: lucidrains/DALLE2-pytorch has 11.3k stars on GitHub.
Where can I find DALLE2-pytorch?: lucidrains/DALLE2-pytorch is on GitHub at https://github.com/lucidrains/DALLE2-pytorch.