Apple’s New AI Model Creates 3D Depth Maps From 2D Images in Less Than a Second - PetaPixel

Four images show a rabbit and a cat with different backgrounds. Each has corresponding thermal images, highlighting specific areas with boxes. The top left image has a rabbit on grass, and the top right shows a fluffy cat indoors.

Apple’s Machine Learning Research team created a new AI model that promises significant improvements concerning computer vision models and how they analyze three-dimensional within a two-dimensional image.

The new AI model, as reported by VentureBeat, is called Depth Pro and is detailed in a new paper, “Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.” Depth Pro promises to create sophisticated 3D depth maps from individual 2D images quickly. The paper’s abstract explains that the model can make a 2.25-megapixel depth map from an image in 0.3 seconds using a consumer-grade GPU.

Although devices like Apple’s latest iPhones can create depth maps using on-device sensors, most still images have no accompanying real-world depth data. However, depth maps for these images can be highly beneficial for numerous applications, including during routine image editing. For example, if someone wants to edit only a subject or introduce an artificial “lens” blur to a scene, a depth map can help software create precise masks. A depth map model can also help with AI image generation, as a deep understanding of depth maps can help a synthesis model produce more realistic results.

A grid of images shows various objects: a rabbit, baskets, and a cat. Each row displays the original on the left, followed by depth maps using different techniques like Depth Pro, Manifold, Depth Anything V2, and Midas3D V2, highlighting varied depth perceptions. — Apple’s Depth Pro model versus competing depth map models. | Credit: Apple Machine Learning Research

As the Apple researchers — Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun — explain, an effective zero-shot metric monocular depth estimation model must swiftly produce accurate, high-resolution results to be helpful. A sloppy depth map is of little value.

“Depth Pro produces high-resolution metric depth maps with high-frequency detail at sub-second runtimes. Our model achieves state-of-the-art zero-shot metric depth estimation accuracy without requiring metadata such as camera intrinsics and traces out occlusion boundaries in unprecedented detail, facilitating applications such as novel view synthesis from single images ‘in the wild,’” Apple researchers explain. However, the team acknowledges some limitations, including trouble dealing with translucent surfaces and volumetric scattering.

A split image showing a grayscale photo of a geometric footbridge over a garden with a small house in the background on the left, and a colorful depth map of the same scene on the right. — Photo credit for the example photo used: Jeremy Gray

A vibrant sunset over a rocky landscape with scattered vegetation on the left. A depth map visual with a color gradient from purple to green is on the right. — Photo credit for the example photo used: Jeremy Gray

As VentureBeat explains, beyond photo editing and novel synthesis applications, a depth map model could also prove useful for augmented reality (AR) applications, wherein virtual objects must be accurately placed within physical space. The Depth Pro model is adept with both relative and absolute depth, which is vital for many use cases.

People can test Depth Pro for themselves on Hugging Face and learn much more about the inner workings of the depth model by reading Apple’s new research paper.

Image credits: Apple Machine Learning Research

READ SOURCE