Google invents Smartglasses configured for Touchless Hand Gesture Interaction and the ability to project an OS onto a user's hand
Google got the jump on Apple regarding a foldable smartphone and just released their second version (Pixel Fold) on August 13, 2024. One of their next goals is to beat Apple to market regarding smartglasses. A patent application that surfaced in Europe yesterday focuses on their smartglasses to work with touchless hand gesture interaction.
Google's patent covers smartglasses that are configured for touchless interaction with a user. In particular, the disclosed head-mounted device is configured to detect and recognize a movement of a user’s fingers as a moving gesture (i.e., dynamic gesture) conveying an action, such as a “click” or a “scroll.”
It's interesting to note that Google filed for this patent the day after Apple's WWDC-23 where hand gestures were introduced with Apple Vision Pro. Of course it's just coincidental (Nudge-Nudge, Wink-Wink).
Google's approach combines low-power and high-power processes in a framework for detecting, recognizing, and responding to the dynamic gestures that lowers an average power consumed by the head-mounted device so that its operating life is not dominated by the touchless interaction capability that it provides.
In some aspects, the techniques the patent describes relate to a head-mounted device including: a camera configured to: capture low-resolution images of a field-of-view (e.g., continuously); and capture high-resolution images of the field-of-view in response to being triggered by a trigger signal; a first processor configured to: generate the trigger signal in response to (e.g., during) a hand being identified in the low-resolution images; and a second processor activated by the trigger signal to: determine a set of keypoints for each of the high-resolution images, the set of keypoints for each high-resolution image corresponding to locations on the hand in the field-of-view. In some aspects, the techniques described herein relate to a system, comprising the head-mounted device and a companion device in communication with the head-mounted device, the companion device including a third processor configured to receive the set of keypoints from the head-mounted device, generate a rendered element based on the set of keypoints, and transmit the rendered element to the head-mounted device for display.
In some aspects, the techniques described relate to a method for tracking a movement of a hand, the method including: configuring a camera to capture low-resolution images of a field-of-view (e.g., continuously); detecting a hand in the low-resolution images; triggering the camera to capture high-resolution images of the field-of-view while the hand is recognized in the low-resolution images; determine a set of keypoints for each of the high-resolution images corresponding to locations on the hand in the field-of-view; and track movements of the set of keypoints over time.
In some aspects, the techniques described relate to a system for dynamic gesture detection including: a head-mounted device including: a camera configured to: capture low-resolution images of a field-of-view continuously; and capture high-resolution images of the field-of-view while being triggered by a trigger signal; a first processor configured to: generate the trigger signal while a hand is recognized in the low-resolution images; and a second processor activated by the trigger signal to: determine a keypoint (or set of keypoints) for each of the high-resolution images, the key point (or set of key points) for each high-resolution image corresponding to a location on the hand in the field-of-view; track movements of the key point (or set of key points) over time; and detect a dynamic gesture based on the movements of the keypoint (or set of keypoints) over time; and a companion device in communication with the head-mounted device, the companion device including a third processor configured to: receive the dynamic gesture from the head-mounted device; generate a rendered element based on the dynamic gesture; and transmit the rendered element to the head-mounted device for display.
Google's patent FIG. 1 below outlines various aspects of their future smartglasses including the use Pixel Buds for audio communications; FIG. 4 is a block diagram of a system for dynamic gesture detection; FIG. 9 is a system block diagram of their smartglasses.
Google's patent FIG. 2 below is a perspective view of a hand as seen and imaged by a head-mounted device/smartglasses; FIG. 5 illustrates a key-point model of a hand. As shown, each finger of a hand may be modeled by a plurality of keypoints (i.e., illustrated as circles). The keypoints may be located at the joints of the finger where a movement may occur and the keypoints may be linked (i.e., illustrated by lines) anatomically to form a model of the hand.
As shown, a full model of the hand may require 4 key points per finger (i.e., 20 keypoints in total) plus one key point corresponding to the base of the hand (e.g., wrist) to which all fingers are referenced.
Google's patent FIG. 6A above illustrates a first possible dynamic gesture (Finger Swipe; FIG. 6B illustrates a second possible dynamic gesture (Finger Pinch); FIG. 7 illustrates a palm-locked rendered element as seen through an augmented reality display. As shown, the rendered element is a system user-interface screen that includes graphics and text arranged as they would be on a fixed screen but now projected as if the surface of the fixed screen with the palm of the user’s hand. The rendered element is palm-locked because it follows the position/orientation (i.e., pose) of the palm as it is moved.