Apple has Won a Vision Pro patent relating to visionOS
Yesterday the U.S. Patent and Trademark Office officially granted Apple a patent that relates to Vision Pro, and more specifically, to visionOS that Apple generically describes as electronic devices that present three-dimensional environments, via the display generation component, that include virtual objects.
In Apple's patent background they noted that methods and interfaces for interacting with environments that include at least some virtual elements (e.g., applications, augmented reality environments, mixed reality environments, and virtual reality environments) are cumbersome, inefficient, and limited.
For example, systems that provide insufficient feedback for performing actions associated with virtual objects, systems that require a series of inputs to achieve a desired outcome in an augmented reality environment, and systems in which manipulation of virtual objects are complex, tedious and error-prone, create a significant cognitive burden on a user, and detract from the experience with the virtual/augmented reality environment. In addition, these methods take longer than necessary, thereby wasting energy. This latter consideration is particularly important in battery-operated devices.
Apple's granted patent/invention primarily covers aspects of visionOS for Apple's spatial computer known as Vision Pro. In some embodiments, the computer system has one or more eye-tracking components. In some embodiments, the computer system has one or more hand-tracking components.
Apple notes that there's a need for electronic devices with improved methods and interfaces for navigating and interacting with user interfaces. Such methods and interfaces may complement or replace conventional methods for interacting with user interfaces in a three-dimensional environment. Such methods and interfaces reduce the number, extent, and/or the nature of the inputs from a user and produce a more efficient human-machine interface.
In some embodiments, the user interacts with the GUI through finger contacts and gestures, movement of the user's eyes and hand in space relative to the GUI or the user's body as captured by cameras and other movement sensors, and voice inputs as captured by one or more audio input devices.
In some embodiments, an electronic device navigates between user interfaces based at least on detecting a gaze of the user. In some embodiments, an electronic device enhances interactions with control elements of user interfaces. In some embodiments, an electronic device scrolls representations of categories and subcategories in a coordinated manner. In some embodiments, the electronic device navigates back from user interfaces having different levels of immersion in different ways.
The systems, methods, and GUIs provide improved ways for an electronic device to interact with and manipulate objects in a three-dimensional environment. The three-dimensional environment optionally includes one or more virtual objects, one or more representations of real objects (e.g., displayed as photorealistic (e.g., “pass-through”) representations of the real objects or visible to the user through a transparent portion of the display generation component – (of Vision Pro) that are in the physical environment of the electronic device, and/or representations of users in the three-dimensional environment.
In some embodiments, an electronic device navigates between user interfaces based at least on detecting a gaze of the user. In some embodiments, the electronic device navigates to a user interface associated with a respective user interface element in response to detecting a gaze of the user directed to the respective user interface element for a predetermined time threshold (e.g., 0.1, 0.2, 0.3, 0.4, 0.5, 1, etc. seconds).
In some embodiments, in response to detecting the user perform a gesture with their hand (e.g., touching a thumb to another finger (e.g., index, middle, ring, little finger) on the same hand as the thumb) while their gaze is directed to the respective user interface element, the electronic device navigates to the user interface in less time than the predetermined time threshold.
Navigating to the user interface either in response to gaze only or navigating the user interface more quickly in response to the user's gaze and hand gesture provide efficient ways of navigating to the user interface with either fewer inputs or in less time.
In some embodiments, control elements include selectable options that, when selected, cause the electronic device to perform an operation in the user interface including the control element. For example, a control element includes a navigation bar that includes selectable options to navigate to different pages of a user interface.
In some embodiments, in response to detecting the user's gaze on the control element, the electronic device updates the appearance of the control element (e.g., enlarges, expands, adds detail to the control element). Updating the appearance of the control element in this way provides an efficient way of interacting with the control element and a way of interacting with the user interface with reduced visual clutter when not interacting with the control element.
In some embodiments, an electronic device scrolls representations of categories and subcategories in a coordinated manner. In some embodiments, the electronic device concurrently displays representations of categories (e.g., of content, files, user interfaces, applications, etc.) and representations of subcategories within one or more of the categories.
In some embodiments, the electronic device navigates back from user interfaces having different levels of immersion in different ways. In some embodiments, the level of immersion of a user interface corresponds to the number of objects (e.g., virtual objects, representations of real objects) and the degree of visibility of objects other than the user interface displayed concurrently with the user interface.
In some embodiments, the electronic device navigates away from a user interface with a first level of immersion in response to a respective input (e.g., detecting the user's gaze on a representation of a previous user interface for a threshold amount of time). In some embodiments, the electronic device forgoes navigating away from a user interface with a second, higher level of immersion in response to the respective input.
Navigating away from user interfaces with the first level of immersion in response to a respective input and forgoing navigating away from user interfaces with the second, higher level of immersion in response to the respective input provides convenience in user interfaces with the first level of immersion and reduces distraction in user interfaces with the second, higher level of immersion, enabling the user to use the electronic device quickly and efficiently.
Apple's patent FIG. 2 below is a block diagram illustrating a controller of a computer system that is configured to manage and coordinate a CGR experience for the user that includes eye (#243) and hand tracking (#244); FIG. 5 is a block diagram illustrating an eye tracking unit of a computer system that is configured to capture gaze inputs of the user; FIG. 9C illustrates an electronic device (Vision Pro) displaying, via a display generation component, three-dimensional environment on a user interface.
Apple's granted patent was never made public as a patent application so as to not allow the competition to see what they were working on. Secondly, Most of Apple's patent figures shown related to an iPad so as to further throw-off preying eyes.
The patent seemed strange when applying the user interface to an iPad. One could say, why not just touch the iPad's touch screen? Why would anyone need to use gaze controls + an in-air gesture to manipulate a 3D icon when simply touching the iPad screen would suffice.
Now that we've all seen the vision OS presentation from WWDC23, we can better comprehend the true nature of this granted patent, with the number one focus really being about visionOS. Also, one of the lead human interface designer's LinkedIn page is listed as having worked on Apple Vision Pro, which is a dead giveaway.
Yet with that said, Apple never wants an invention to be singular in use, at least in a patent. So Apple notes that the invention could apply to other future devices such as desktop Macs, iPad, iPhone and Apple Watch – even though it's difficult to envision using eye-tracking to move an object when you have a mouse, trackpad or touch input on a display.
For more details, review Apple's granted patent 11720171.
Some of the Team Members on this Apple Project
- Jesse Chand: Human Interface Designer at Apple Vision Pro
- Jeff Faulkner: Apple Design Director, Human Interface Design
- Pol Pla: Principal UX/UI Prototyper, Human Interface, Apple Design Team (now working at Meta as AR Product Design Manager)
- Israel Pastrana Vicente: Human Interface Designer
- Moon; Jay: Human Interface Designer
- Jon Dascola: Human Interface Design
Comments