VAST-AI-Research/TripoSG
A foundation model for generating high-fidelity 3D shapes from images using rectified flow transformers trained on 2M Image-SDF pairs.

TripoSG is an image-to-3D generation model that produces high-fidelity 3D meshes from single images. It leverages a large-scale rectified flow transformer architecture combined with a VAE that encodes Signed Distance Functions, trained with hybrid supervision including SDF loss, surface normal guidance, and eikonal loss. The model handles diverse input styles including photorealistic images, cartoons, and sketches, producing shapes with sharp geometric features and fine surface details.