The models use this learned "denoising" process to create images based on a user's text prompts. But diffusion models underperform at directly generating realistic 3D shapes because there are not ...