Text-to-3D that actually looks good, mostly
A NeurIPS spotlight paper replaces DreamFusion's SDS with variational score distillation to squeeze higher fidelity out of Stable Diffusion.

What it does ProlificDreamer generates 3D assets from text prompts through a three-stage pipeline: NeRF initialization with VSD guidance, geometry refinement via DMTet, and final texturing. It’s built on top of stable-dreamfusion and requires substantial GPU memory—up to ~27GB in stage one for 512×512 rendering.
The interesting bit The core trick is Variational Score Distillation (VSD), which treats the 3D representation as a random variable rather than optimizing a single point estimate. The authors claim this yields more diverse and higher-fidelity results than standard SDS, though you’ll need to run the full three-stage gauntlet to see the benefit.
Key highlights
- Three-stage pipeline: NeRF → geometry refinement → texturing, with a
run.shscript to automate the handoff - VSD guidance in stages 1 and 3; stage 2 uses SDS with normal supervision for mesh cleanup
- Multi-particle generation supported only in stage 1, trading compute for diversity
- Already absorbed into the broader Threestudio ecosystem
- NeurIPS 2023 Spotlight; code is the official implementation
Caveats
- The “multi-face Janus problem” is explicitly acknowledged as prevalent—Stable Diffusion wasn’t trained on multi-view data, so your 3D pineapple may have too many eyes
- Stage 1 demands ~27GB VRAM; this is not laptop-friendly research
- The README suggests brute-force solutions to quality issues: “increase lambda_entropy” for fog, “increase density_thresh” for floaters, “try different seeds” for Janus faces
- MVDream integration to fix the multi-view problem is on the TODO list, unchecked
Verdict Worth a look if you’re doing text-to-3D research and have the hardware to burn. Skip it if you need production-ready assets out of the box, or if your GPU has less memory than a mid-2010s gaming rig.