Future AirPods Max will deliver advanced Spatial Audio by assessing a user's Head Orientation relative to the User's Torso via Cameras
Both Mark Gurman and Ming-Chi Kuo have predicted that future AirPods will come with integrated cameras. The next-gen AirPods will enhance the Spatial Audio experience. Apple has been working on such a headphone/media system for at least five years, well before the rumors began, according to their latest patent.
In a patent application published today, Apple describes the reasoning behind integrating a camera system into AirPods Max and AirPods (Pro) to provide next-gen Spatial Audio.
Spatial Audio Reproduction Based On Head-to-Torso Orientation
Spatial audio can be played using headphones that are worn by a user. For example, the headphones can reproduce a spatial audio signal communicated by a device to simulate a soundscape around the user. An effective spatial sound reproduction can recreate sounds such that the user perceives the sound as coming from a location within the soundscape external to the user's head, just as the user would experience the sound if encountered in the real world.
Apple notes in their patent filing that existing virtual audio rendering systems are required to know the user's head orientation relative to the virtual sound source in order to select an appropriate head-related transfer function (HRTF). Typically, the HRTF is defined and measured as having a dependence on an azimuth angle, elevation angle, and sometimes a distance between the virtual sound source and the user's head.
Definitions of the HRTF dataset up until now do not encapsulate the dimension related to the orientation of the rest of the body relative to the user's head. More particularly, changes away from a nominal forward-facing head-to-torso orientation are not accounted for when using a HRTF dataset; the torso is assumed to rotate and move with the head. In this way, a user that turns their head to the right while keeping their torso stationary, e.g., facing forward, will have the unsettling experience of hearing sound as though they turned their torso to the right concurrently with his head. In other words, the virtual audio rendering systems do not differentiate between cases when the head and torso are moved separately and cases when the head and torso are moved together. This disregard for head-to-torso orientations by existing virtual audio rendering systems results in spatial audio renderings that do not accurately reproduce the effects that the torso orientation has on sound sources in real life.
A media system and a method of using the media system to accurately reproduce virtual audio taking into account a user's head orientation relative to the user's torso, are described in Apple's patent.
In an embodiment, the media system includes one or more processors configured to determine a head-to-source orientation and a head-to-torso orientation.
The head-to-source orientation can be a relative position and/or orientation between a head of a user and a sound source. The relative orientation can be determined from head tracking data generated by a head tracking device, such as a head mounted device (like AirPods Max) having inertial measurement units.
The head-to-torso orientation can be a relative position and/or orientation between the head of the user and a torso of the user. The relative orientation can be directly measured, e.g., by one or more sensors of the head tracking device or a companion device. Alternatively, the relative orientation can be inferred based only on the head tracking data generated by the head tracking device.
Estimation of the head-to-torso orientation based on the head tracking data can include determining that the torso moves toward alignment with the head when the head orientation data meets a head movement condition.
For example, the torso may move when the head moves. Alternatively, the torso may move when the head has moved and then stopped moving at a new orientation. In an aspect, the torso may move when the head moves in a particular pattern. In any case, the movement of the torso can be related to the head movement, e.g., numerically through an average or median of head tracking data, or in some other manner, e.g., by moving the torso according to a particular pattern that corresponds to the pattern detected for the head movement.
Inference of the head-to-torso orientation can also be based on contextual data that exists at the time of the head movement. For example, the inference may be based on a current state of the user, e.g., whether the user is ambulatory, or a current use of the head tracking device, e.g., whether the system is being used to reproduce a soundscape of a movie. In any case, the contextual information can provide additional information to control whether or how the estimation of torso movement is made.
Based on the head-to-source orientation and the head-to-torso orientation (whether measured or inferred), the media system can select an appropriate head-related transfer function (HRTF) to realistically render spatial audio.
The HRTF may be numerically simulated to represent a particular pose of the user that is being rendered. An audio filter based on the HRTF can be applied to an audio input signal to generate an audio output signal. When played by the media system, the audio output signal can recreate spatial audio that accounts for the particular pose of the user and accurately reproduce real life.
Apple's patent FIG. 1 below is a pictorial view of a user positioned relative to a sound source in a soundscape; FIG. 2 is a flowchart of a method of reproducing spatial audio based on a head-to-torso orientation; FIG. 4 is a pictorial view of a head of a user moving relative to a sound source in a soundscape.
Apple's patent FIG. 8 above is a pictorial view of various head-to-torso measurements for determining a head-to-torso orientation directly; FIG. 9 is a pictorial view of a head above torso mesh generation used to simulate a head-related transfer function for various combinations of head-to-source and head-to-torso orientations.
For full details, review patent application 20240357308.