Apple is reportedly partnering with Alibaba to bring AI Features to iPhones in China
The Apple Health Study launched today aims to assist users understand how Apple devices advance physical health, mental health and more

Apple is developing a way to improve future Hand Tracking Accuracy by using 'Spatiotemporal Smoothing'

1 cover apple patent report image

Yesterday the U.S. Patent and Trademark Office officially granted Apple a patent that relates to hand tracking and, in particular, to systems, methods, and devices associated with spatiotemporal smoothing for improved hand tracking.

In Apple's patent background they note that in various implementations, an extended reality (XR) environment is presented by a head-mounted device (HMD) that includes a scene camera that captures an image of the physical environment in which the user is present (e.g., a scene) and a display that displays the image to the user. In some instances, this image or portions thereof can be combined with one or more virtual objects to present the user with an XR experience.

In other instances, the HMD can operate in a pass-through mode in which the image or portions thereof are presented to the user without the addition of virtual objects. Ideally, the image of the physical environment presented to the user is substantially similar to what the user would see if the HMD were not present. However, due to the different positions of the eyes, the display, and the camera in space, this may not occur, resulting in impaired distance perception, disorientation, and poor hand-eye coordination.

Spatiotemporal Smoothing For Improved Hand Tracking

Overall, the patent covers a method that includes: obtaining uncorrected hand tracking data; obtaining a depth map associated with a physical environment; identifying a position of a portion of the finger within the physical environment based on the depth map and the uncorrected hand tracking data; performing spatial depth smoothing on a region of the depth map adjacent to the position of the portion of the finger; and generating corrected hand tracking data by performing point of view (POV) correction on the uncorrected hand tracking data based on the spatially depth smoothed region of the depth map adjacent to the portion of the finger.

More specifically, Apple notes that in an HMD with a display and a scene camera, the image of the physical environment presented to the user on the display may not always reflect what the user would see if the HMD were not present due to the different positions of the eyes, the display, and the camera in space.

In various circumstances, this results in poor distance perception, disorientation of the user, and poor hand-eye coordination, e.g., while interacting with the physical environment. Thus, in various implementations, images from the scene camera are transformed (e.g., point-of-view (POV) correction) such that they appear to have been captured at the location of the user's eyes using a depth map representing, for each pixel of the image, the distance from the camera to the object represented by the pixel. In various implementations, images from the scene camera are partially transformed (e.g., partial POV correction) such that they appear to have been captured at a location closer to the location of the user's eyes than the location of the scene camera.

In various implementations, the depth map is altered to reduce artifacts. For example, in various implementations, the depth map is smoothed so as to avoid holes in the transformed image.

In various implementations, the depth map is clamped so as to reduce larger movements of the pixels during the transformation. In various implementations, the depth map is made static such that dynamic objects do not contribute to the depth map.

For example, in various implementations, the depth map values at locations of a dynamic object are determined by interpolating the depth map using locations surrounding the locations of the dynamic object. In various implementations, the depth map values at locations of a dynamic object are determined based on depth map values determined at a time the dynamic object is not at the location.

In various implementations, the depth map is determined using a three-dimensional model of the physical environment without dynamic objects. Using a static depth map may increase spatial artifacts, such as the objects not being displayed at their true locations. However, using a static depth map may reduce temporal artifacts, such as flickering.

Discontinuities occur between uncorrected hand tracking data and transformed images (e.g., POV corrected images) especially when depth changes suddenly. As one example, discontinuities may occur during hand tracking when a user's hand starts in front of a wall or another background near to the scene camera and moves to a subsequent location in front of a deeper background such as a hallway or an open room.

To improve hand tracking and reduce resource consumption, the method described in Apple's patent may perform spatial depth smoothing around a finger portion (e.g., an index fingertip) and differential temporal depth smoothing thereafter.

2 hand gesture smoothing architecture
Developers and engineers could review this in-depth granted patent 12223117 here.

Some of Apple's Inventors

  • Emmanuel Piuze-Phaneuf: Engineering Manager – Camera, Vision Pro
  • Paul Lacey: Perception Input Architect (Gaze/Hands, Microgestures)
  • Julian Shutzberg: No LinkedIn profile found

 

A micro-gesture (microgesture) is a gesture that is created from a defined user action that uses small variations in configuration, or micro-motions. In the most general sense, micro-gestures can be described as micro-interactions. Being that micro-gestures use small variations in configuration and motion, gesture actions can be achieved with significantly less energy than typical touch, motion or sensor-enabled gestures.

10.52FX - Granted Patent Bar