Apple Wins a Surprising Patent for a Future Entertainment System that Could Respond to a user's Presence & Activity
Earlier today we posted a granted patent report covering an overview of the 56 patents granted to Apple today. The most interesting patent in this group was one titled "Multi-media computing or entertainment system for responding to user presence and activity."
Apple's advanced work on depth sensors and cameras has been in large part developed in Apple's Israeli base. Apple's TrueDepth camera had many of PrimeSense engineers working on it to deliver Face ID, Animoji and more. There are at least a dozen patents covering the use of a depth camera like TrueDepth for future gesture recognition (01, 02, 03, 04, 05, 06, 07 and more in our archives). Today's granted patent also appears to include technology from LinX, a company Apple acquired back in 2015.
To make in-air gesturing able to control future Apple devices accurately will take a combination of AI, machine learning and computer vision technologies to work seamlessly together. Three of Apple's engineers listed on today's granted patent have such backgrounds. Thorsten Gernoth is a Computer Vision Engineer; Xiaojin Shi is Apple's Manager on the Computer Vision Markup Language team; and Feng Tang, a Machine Learning Algorithm Manager.
In the big picture, a major Apple patent on gesturing systems came to light today, not as a patent application but rather as a granted patent. It's one of the most elaborate patents on this technology to date from Apple. The sheer number of patents that Apple has on this technology and the fact that Apple has used some of it to create the TrueDepth camera thus far shows us the seriousness of Apple wanting to bring gesturing controls to market, especially for future home entertainment systems. Apple may have dabbled into this project with HomePod, but where its eventually going is far beyond to control an entire entertainment system that will definitely include delivering a new dimension to gaming.
Because we never got to cover today's granted patent as a patent application, today's report will take on that style of report by digging further into Apple's invention to provide you with a better overview of the invention.
Apple's Patent Background
In Apple's background they provide an overview of what their invention will be in context with before laying out the advances they could bring to future entertainment systems.
Apple notes that traditional user interfaces for computers and multi-media systems are not ideal for a number of applications and are not sufficiently intuitive for many other applications.
In a professional context, providing stand-up presentations or other types of visual presentations to large audiences is one example where controls are less than ideal and, in the opinion of many users, insufficiently intuitive. In a personal context, gaming control and content viewing/listening are but two of many examples.
In the context of an audio/visual presentation, the manipulation of the presentation is generally upon the direction of a presenter that controls an intelligent device (e.g. a computer) through use of remote control devices.
Similarly, gaming and content viewing/listening also generally rely upon remote control devices. These devices often suffer from inconsistent and imprecise operation or require the cooperation of another individual, as in the case of a common presentation.
Some devices, for example in gaming control, use a fixed location tracking device (e.g., a trackball or joy-stick), a hand cover (aka, glove), or body-worn/held devices having incorporated motion sensors such as accelerometers. Traditional user interfaces including multiple devices such as keyboards, touch/pads/screens, pointing devices (e.g. mice, joysticks, and rollers), require both logistical allocation and a degree of skill and precision, but can often more accurately reflect a user's expressed or implied desires. The equivalent ability to reflect user desires is more difficult to implement with a remote control system.
When a system has an understanding of its users and the physical environment surrounding the user, the system can better approximate and fulfill user desires, whether expressed literally or impliedly. For example, a system that approximates the scene of the user and monitors the user activity can better infer the user's desires for particular system activities.
In addition, a system that understands context can better interpret express communication from the user such as communication conveyed through gestures.
As an example, gestures have the potential to overcome the aforementioned drawbacks regarding user interface through conventional remote controls.
Gestures have been studied as a promising technology for man-machine communication.
Various methods have been proposed to locate and track body parts (e.g., hands and arms) including markers, colors, and gloves. Current gesture recognition systems often fail to distinguish between various portions of the human hand and its fingers.
Many easy-to-learn gestures for controlling various systems can be distinguished and utilized based on specific arrangements of fingers.
However, current techniques fail to consistently detect the portions of fingers that can be used to differentiate gestures, such as their presence, location and/or orientation by digit.
Granted Patent: Entertainment System for Responding to User Presence and Activity
Apple's granted patent discusses systems, methods, and computer readable media to improve the operation of user interfaces including scene interpretation, user activity, and gesture recognition.
The invention covers the ability to interpret the intent or desire of one or more users and responding to the perceived user desires, whether express or implied. Many embodiments of the invention employ one or more sensors used to interpret the scene and user activity. Some example sensors may be a depth sensor, an RGB sensor and even ordinary microphones or a camera with accompanying light sensors.
Varying embodiments of the invention may use one or more sensors to detect the user's scene.
For example, if the system serves as a living room entertainment system, the scene may be the user's living room as well as adjacent areas that are visible to the sensors. The scene may also be as small as the space in front of a user's workstation, the interior of a car, or even a small area adjacent to a user's smart phone or other portable device (to interpret user desires with respect to that device).
The scene may additionally be large, for example, including an auditorium, outdoor area, a playing field, or even a stadium. In sum, the scene may be any area where there is a value for intelligent systems such as computers or entertainment systems to interpret user intent or desire for system activity.
Interpreting User Activity
In some embodiments, the system may also sense and interpret user activity, such as the posture, position, facial expressions, and gestures of the user. The information may be used to alter the state of the system (e.g. computer or entertainment system) to better suit the user(s).
Many embodiments of the invention allow for direct user manipulation of the system either to operate system settings or to control an application of the system such as games, volume, tuning, composing, or any manipulations that a user might expressly desire from the system in use.
Express Communication with Fine Hand Gestures
In the case of express communication with a system, some embodiments contemplate the identification of fine hand gestures based on real-time depth information obtained from, for example, optical- or non-optical-type depth sensors.
More particularly, techniques of the invention may analyze depth information in "slices" (three-dimensional regions of space having a relatively small depth) until one or more candidate hand structures are detected.
At the time of the patent application (2015), some examples of devices having this type of depth sensing capability are made by LinX Imaging. Interestingly enough Apple acquired Israeli based LinX imaging in 2015.
Apple notes that "no limitation is intended by these hardware and software descriptions and the varying embodiments of the inventions herein may include any manner of computing devices such as Macs, PCs, PDAs, phones, servers, or even embedded systems."
Last year Patently Apple posted a report titled "Apple Reveals a Future Stereo System that Automatically Readjusts Sound when Speakers are moved or Added." The patent covered the use a depth cameras to understand where the user was in the room so as to always ensure the user was getting a superior audio experience. The patent figure below is from that Apple patent.
So Apple has been working on this project one stage at a time.
Apple's patent FIGS. 4a and 4b below are embodiments that contemplate detection of the scene geometry by the system or devices and equipment in cooperation with the system. One or more sensors may be employed to detect the scene geometry, which generally refers to the structure of the room in two or three dimensions.
For example, one type of scene geometry contemplated for embodiments of the invention involves determining or estimating the location and/or nature of each element in a space visually or acoustically exposed to the system (e.g. a multi-media entertainment center or computer system).
Thus, varying embodiments of the invention may determine or estimate the two- or three-dimensional position of vertical surfaces #410, such as walls; horizontal surfaces #435, such as floors; furniture 3410 or other chattel #440; fixtures #445; as well as living things, such as pets or humans #430. By understanding scene geometry, the system may provide a better experience for the user.
In addition to depth detection, some contemporary depth cameras also detect infrared reflectance. Information regarding infrared reflectance of an object reveals properties of the object including information about color. Information detected regarding infrared reflectance can be used with the depth information to aid in identifying objects in the room or potentially determining how those objects affect the use of the system in the scene (e.g., the effects on a user's perception of light and sound).
Apple's patent FIG. 5 above shows an illustrative process for detecting and responding to user intent or desire; FIG. 6 shows an illustrative process for evaluating scene geometry.
Apple's patent FIG. 7 below illustrates example conceptions regarding user activity; FIG. 8 illustrates example conceptions regarding user activity indicators.
Apple's patent FIGS. 9A and 9B above illustrate sample audio paths. This part of the patent may have been in some way used with Apple's HomePod; Patent FIG. 10 shows an example process for responding to user activity.
Apple's patent FIG. 11 Below shows an example process for analyzing user engagement; FIG. 13 shows, in flowchart form, a gesture identification operation; FIGS. 14 and 15 illustrate a three-dimensional image system in accordance with one embodiment.
Apple's patent FIG. 17 below illustrates a candidate hand mask generation operation; FIG. 20 shows, in flowchart form, a feature extraction operation; FIG. 21 we see a depth-aware filtering approach.
Apple's patent FIGS. 22 and 23 illustrate one approach to representing a hand's volume (depth map); and lastly FIG. 24 shows, in block diagram form, a two-stage gesture classifier.
Overview of the Patent Segments
Exemplary Hardware and Software
- System Response
Pausing and Powering Down the System or Media Presentation.
User Communicating with the System.
Gesture Recognition & Fine Gesture Detection
Apple's granted patent 10,048,765 was originally filed in Q3 2015 and published today by the US Patent and Trademark Office.
Patently Apple presents only a brief summary of granted patents with associated graphics for journalistic news purposes as each Granted Patent is revealed by the U.S. Patent & Trademark Office. Readers are cautioned that the full text of any Granted Patent should be read in its entirety for full details. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.