Earlier today Patently Apple posted a granted patent report titled "Apple Granted 52 Patents Today Covering 3D Glasses, In-Air Gesturing and a Micro-LED Patent for Future iDevice." Apple has many patents regarding a 3D depth camera used to capture and recognize gestures. In fact technically, Apple could introduce it with their iPhone 8 if they so choose to do so because the camera that will provide users with advanced facial recognition is the same 3D sensing camera that could recognize in-air gestures. Over the weekend I reported on a Microsoft patent covering the recognition of hand gestures and more for next-gen Surface devices.
In this current report the focus is on a surprising granted patent from Facebook describing a whole new desktop system designed to capture user hand gestures in controlling the user interface. Facebook's sole inventor is Robert Wang from Facebook's Oculus team. Wang is a research scientist at Oculus VR building computer vision systems and advanced technology for virtual reality. What's more important to recognize is the Silicon Valley trend in making in-air gesturing a major user interface tool in the future. It could be used for the desktop, notebooks, tablets and yes, VR and mixed reality headsets.
Facebook's Patent Background
Determining the 3D pose of a user's hands, including their 3D position, orientation and the configuration of the fingers is referred to as "hand pose estimation." Hand pose estimation across a time sequence is referred to as "hand tracking." Hand tracking permits natural hand motions or gestures to be used as input to a computer system. However, traditional methods of hand tracking through image processing have not been efficient enough or robust enough to control a computer system in real-time. Instead, users have had to wear special instrumented or patterned gloves for a computer system to track the hands.
In some applications, such as computer aided design or entertainment applications, having to wear a glove to facilitate hand tracking is undesirable and impractical. Instrumented gloves are cumbersome and can reduce dexterity. Any type of glove can be uncomfortable to wear for a significant amount of time. The act of putting on or taking off a glove can be tedious and detract from the task at hand. Ideally, a user can interact with a computer using his hands in a completely unencumbered manner.
There is, therefore, a need in the art for improved system and methods for gesture-based control.
Facebook's Invention: Gesture-Centric Control System
Facebook's patent covers systems and methods that will permit a user to interact with a computer using hand gestures. The configuration and movements of the hands and fingers, or hand gestures, can be used as input. A computer can generate a display that responds to these gestures. The generated display can include objects or shapes that can be moved, modified or otherwise manipulated by a user's hands.
In one embodiment, a pair of imaging devices mounted above a desk is used to record images of the user's hands. The image regions corresponding to each hand is determined and encoded as a descriptive feature. The feature is used to query a precomputed database that relates the descriptive features to 3D hand pose. The 3D poses of each hand are analyzed to interpret gestures performed by the user. One example of such a gesture is the action of grabbing. These interpreted gestures as well as the 3D hand poses can be used to interact with a computer.
In one embodiment, the imaging device can include a camera. The camera can further include a color video camera, an infrared camera, an ultra-violet camera, and a hyper-spectral camera. The hand region is segmented from the background based on the contrast between the skin tone of the hand region and the color, brightness or texture of the background. The descriptive feature used to encode each segmented hand image is a low-resolution silhouette of each hand region.
In another embodiment, the cameras used to record the user can include depth cameras. One type of depth camera includes an active stereo depth camera. For an active stereo depth camera, an infrared pattern is projected from a known position near each camera and the observed reflected pattern by the camera is interpreted as a depth image. The hand region is segmented from the background based on the calibrated 3D location of the desk and other objects in the background. Non-background regions of the depth image are presumed to be the user's hands. The descriptive feature used to encode each segmented hand depth image is a low-resolution depth image of each hand region.
One aspect of the invention relates to computing a database associating hand features to 3D hand pose. One way to achieve this database is by calibrating the interaction region and using computer graphics to render all possible hand poses in the interaction region. Each hand pose is rendered from the point of view of each camera, and the resulting images are encoded as descriptive features. The features from each camera view are then associated in the database with the hand pose used to generate the features. One way to reduce the size of the database is to render only the finger configurations that are used for gestures relevant to the system.
Another aspect of the invention uses descriptive image features to query the database relating image features to 3D hand poses. One way to achieve the image feature is to use a low-resolution silhouette of the segmented hand image or depth image. Another way to achieve the image feature is to use locality sensitive hashing of the segmented hand image or depth image. Another way to achieve the image feature is to use boosting on the database of hand images to learn the most descriptive elements of the hand image for distinguishing 3D hand pose. The hashed or boosted features can be stored compactly as short binary codes. Given input image features generated from the recorded hand images, the database can be queried by comparing the input feature with each feature in the database. The most similar features in the database and their corresponding 3D hand poses determine the estimated 3D hand pose of the user.
Another aspect of the invention interprets 3D hand poses as hand gestures. One particularly significant gesture is the precise grabbing or pinching gesture where the index finger and thumb make contact. One way to achieve the robust recognition of the grabbing gesture is by detecting extrema of the segmented hand image or depth image. The extrema points are corresponded with the predicted locations of the index and thumb finger-tips from the 3D hand pose. If a correspondence is found, the identified thumb and index finger tips in the hand images are then tested for contact.
Facebook's patent FIG. 1 illustrated below shows a first configuration of a preferred embodiment using two cameras above a desk to capture hand gestures made by the user to control elements of a GUI.
Facebook's patent FIG. 2 illustrated below shows an alternative configuration wherein a single depth camera above a desk is used to capture the user's hand gestures.
In Facebook's patent FIG. 3 noted below we're able to see a configuration where two cameras are below a transparent surface to capture a user's hand gesture to control the computer's interface.
In Facebook's patent FIG. 4 illustrated below we're able to see a configuration wherein two cameras are mounted above a monitor and the user is standing; FIG. 5 illustrates the process of generating a database of hand features.
Facebook's patent filing was originally made in Q3 2012 and granted to them earlier this month by the U.S. Patent and Trademark Office.
About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.