Today, AI systems such as OpenAI's DALL-E 2 or Google's Imagen can only generate two-dimensional images. If text can also be turned into three-dimensional scenes, the visual experience will be doubled.
Now, the AI team from Apple has unveiled GAUDI , the latest neural architecture for 3D scene generation . It can capture complex and realistic 3D scene distributions, perform immersive rendering from moving cameras, and create 3D scenes based on text prompts ! The model is named after Antoni Gaudi, a famous Spanish architect.
The emergence of GAUDI will not only have an impact on many computer vision tasks, but its 3D scene generation capabilities will also benefit research areas such as model-based reinforcement learning and planning, SLAM, and 3D content production.
For now, the quality of the video generated by GAUDI is not high, and it can be seen that there are many artifacts . However, this system may be a good start and foundation for Apple's ongoing AI system for rendering 3D objects and scenes, and GAUDI is also said to be used in Apple's XR headset for generating digitized positions.
"We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes.
Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene."
For engineers and hard core techies, click here to view Apple’s paper on GAUDI: A Neural Architect for Immersive 3D Scene Generation.