Apple Introduces the use of Neural Networks to advance Eye and Gaze Tracking in a Future HMD
Patently Apple has covered many eye and gaze tracking patents relating to Apple's future head mounted devices such a full VR headset or mixed reality glasses (01, 02, 03 and 04). Every such patent covers eye and gaze tracking in new and complimentary ways. It's a crucial element in future headsets and different Apple engineering teams have focused on various parts of this massive project over time.
Today the US Patent & Trademark Office published a patent application from Apple that relates to event-based gaze tracking using neural networks. Apple's patent only relates to technology that is present in a future HMD and not AR Glasses.
In Apple's patent background they note that existing gaze tracking systems determine gaze direction of a user based on shutter-based camera images of the user's eye. Existing gaze tracking systems often include a camera that transmits images of the eyes of the user to a processor that performs the gaze tracking. Transmission of the images at a sufficient frame rate to enable gaze tracking requires a communication link with substantial bandwidth and using such a communication link increases heat generated and power consumption by the device.
Overall, Apple's patent covers devices, systems, and methods that use neural networks for event camera-based gaze tracking.
In various implementations, the camera includes an event camera with a plurality of light sensors at a plurality of respective locations that, in response to a particular light sensor detecting a change in intensity of light, generates an event message indicating a particular location of the particular light sensor.
An event camera may include or be referred to as a dynamic vision sensor (DVS), a silicon retina, an event-based camera, or a frame-less camera.
The event camera generates (and transmits) data regarding changes in light intensity as opposed to a larger amount of data regarding absolute intensity at each light sensor. Further, because data is generated when intensity changes, in various implementations, the light source is configured to emit light with modulating intensity.
In various implementations, the asynchronous pixel event data from one or more event cameras is accumulated to produce one or more inputs to a neural network configured to determine one or more gaze characteristics, e.g., pupil center, pupil contour, glint locations, gaze direction, etc.
In other implementations, event camera data is uses as input to a neural network in other forms, e.g., individual events, events within a predetermined time window, e.g., 10 milliseconds.
In various implementations, a neural network that is used to determine gaze characteristics is configured to do so efficiently. Efficiency is achieved, for example, by using a multi-stage neural network.
The first stage of the neural network is configured to determine an initial gaze characteristic, e.g., an initial pupil center, using reduced resolution inputs. For example, rather than using a 400.times.400 pixel input image, the resolution of the input image at the first stage can be reduced down to 50.times.50 pixels. The second stage of the neural network is configured to determine adjustments to the initial gaze characteristic using location-focused input, e.g., using only a small input image centered around the initial pupil center.
Apple's patent FIG. 1 below is a block diagram of an example operating environment that includes a controller #110 and a head-mounted device (HMD) #120. The controller is configured to manage and coordinate an augmented reality/virtual reality (AR/VR) experience for the user.
According to some implementations, the HMD presents an AR/VR experience to the user while the user is virtually and/or physically present within the scene #105. In some implementations, while presenting an AR experience, the HMD is configured to present AR content and to enable optical see-through of the scene. In some implementations, while presenting a VR experience, the HMD is configured to present VR content and to enable video pass-through of the scene.
Apple's patent FIG. 2 above is a block diagram of an example controller. In some implementations, the tracking unit #244 is configured to map the scene and to track the position/location of the HMD with respect to the scene.
Apple's patent FIG. 3 below is a block diagram of an example of the head-mounted device (HMD). In various implementations, the AR/VR presentation module #340 includes a data obtaining unit 342, an AR/VR presenting unit 344, a gaze tracking unit 346, and a data transmitting unit 348.
In some implementations, the gaze tracking unit is configured to determine a gaze tracking characteristic of a user based on event messages received from an event camera. To that end, in various implementations, the gaze tracking unit includes instructions for configured neural networks.
Apple's patent FIG. 4 above is an overview of the Head Mounted Display System.
Apple's patent FIG. 5 below illustrates a block diagram of an event camera; FIG. 7 illustrates a functional block diagram illustrating an event camera-based gaze tracking process. The gaze tracking process 700 outputs a gaze direction of a user based on event messages received from the event camera #710.
Further to FIG. 7 above, the intensity reconstruction image generator #750, the timestamp image generator #760, and the glint image generator #770 provide images that are input to the neural network 780, which is configured to generate the gaze characteristic.
There is a lot of fine detail to each of the patent figures illustrated above and further details regarding the heart of the patent which covers "neural networks" here.
Apple's patent application number 20200348755 was filed back on July 21, 2020. Considering that this is a patent application, the timing of such a product to market is unknown at this time.