Apple files new 'Spatial Audio' patent for its future HMD and is likely to apply to AirPods Pro and Next-Gen Apple TV
Today the US Patent & Trademark Office published a patent application from Apple that generally relates to computerized data processing systems and methods for audio processing, and in particular to spatial audio processing.
Spatial audio was advanced at this year's WWDC and we covered it our report titled "It was a Historic WWDC Keynote as Apple introduced Macs with 'Apple Silicon' coming to market this Fall & Spatial Audio for AirPods Pro." To date, Patently Apple has covered at least two spatial audio related patents from Apple (01 and 02).
Today Apple's patent covers 'Spatial Audio Downmixing' and the hardware it will be available on. Spatial Audio will be supported on future HDTVs and Streaming Services like Apple TV+ via MPEG-H and/or 'Higher-Order Ambisonics.'
Apple is likely to support this in their next Apple TV box as we already know that Apple's upcoming AirPods Pro will support Spatial Audio as this was covered during WWDC20. Below is the WWDC20 video that's set to start at that specific point where the Spatial Audio segment begins. That segment ends at the 45:10 mark when you can stop the video.
Spatial Audio will likely extend to Apple's first over-ear headphones and be able to sync with Apple's next generation of Apple TV. The audio will be as good or even better than what you now experience in a theater with Dolby ATMOS.
Apple's patent background notes that producing three-dimensional (3D) sound effects in augmented reality (AR), virtual reality (VR), and mixed reality (MR) encompassed by the term "simulated reality" or SR, applications is commonly used to enhance media content.
Examples of spatial audio formats designed to produce 3D sound include MPEG-H (Moving Picture Experts Group) 3D Audio standards, HOA (Higher-order Ambisonics) spatial audio techniques, and DOLBY ATMOS surround sound technology.
For example, sound designers add 3D sound effects by manipulating sounds contained in spatial audio objects to enhance a scene in an SR application, where the sounds are ambient sounds and/or discrete sounds that can be virtually located for playback by the spatial audio system anywhere in the virtual 3D space created by the SR application.
Hardware Supporting Spatial Audio
Apple's patent provides us with a broad view of the hardware that will support spatial audio in the future. Apple further notes that many electronic systems will enable an individual to interact with and/or sense various SR settings.
The first and most important hardware will be head mounted systems. Apple describes a head mounted system that may have an opaque display and speaker(s). Alternatively, a head mounted system may be designed to receive an external display (e.g., a smartphone). The head mounted system may have imaging sensor(s) and/or microphones for taking images/video and/or capturing audio of the physical setting, respectively.
A head mounted system also may have a transparent or semi-transparent display. The transparent or semi-transparent display may incorporate a substrate through which light representative of images is directed to an individual's eyes. The display may incorporate LEDs, OLEDs, a digital light projector, a laser scanning light source, liquid crystal on silicon, or any combination of these technologies.
Other examples of SR systems include heads up displays, automotive windshields with the ability to display graphics, windows with the ability to display graphics, lenses with the ability to display graphics, headphones or earphones, speaker arrangements, input mechanisms (e.g., controllers having or not having haptic feedback), tablets, smartphones, and desktop or laptop computers.
Spatial Audio Downmixing
The heart of Apple's patent covers spatial audio downmixing to enable augmented reality/virtual reality/mixed reality (SR) application developers, and listeners in an SR experience created by the SR application, to preview a sound from audio data in which the sound has been encoded and that is capable of being composed into the SR application.
In one embodiment the audio data in which the sound is recorded or encoded is stored as a spatial audio object that preserves spatial characteristics of one or more recorded sounds.
In one embodiment, the spatial audio object contains several channels of audio data representing the one or more recorded sounds, each channel being associated with any one or more of a direction and a location (distance), e.g. of a source of the recorded sound.
Apple notes that there may be two or more of such channels that are associated with a given direction or location, e.g., a multi-channel microphone pickup. In other embodiments, the spatial audio object contains multiple channels of an ambisonics format (spherical harmonics format) representation of a sound field where in that case each channel is associated with a respective spatial distribution, e.g., B-format WXYZ channels. To then enable the aural preview, the audio channels are subjected to a spatial audio downmixing operation.
Apple's patent FIG. 4 illustrates the spatial sound preview process in further detail. In one embodiment, a composed spatial audio object #404, such as the combined forest/waterfall ambient sounds that is presented in FIG. 1 further below.
Further to FIG. 4, Apple notes that the spatial sound preview user interface #406 generates (operation #408) a visualized spatial sound object #206 (see FIG. 2 below), such as a virtual globe (e.g., sphere, bubble, cube, polyhedron, etc.) in response to a request (e.g., from the user) to preview the sound represented in the composed spatial audio object #404.
In the example of FIG. 4, the object is a virtual sphere having a central origin from which all of the spatial sounds represented by the different triangles will emanate.
In other words, each triangle may represent a loudspeaker (acoustic output transducer) that is pointed outward and placed at the same location (the central origin of the virtual sphere).
Apple's patent FIG. 1 below is a block diagram illustrating an overview of spatial sound use in SR environments; FIG. 2 is a block diagram illustrating a spatial sound preview.
In one embodiment, generating the visualized spatial sound object includes adding an image to the visualized spatial sound object for each of the oriented channels of the visualized spatial sound object. The image may be a still picture, or it may be part of a video sequence, and may be added to the visualized sound object for each of oriented channels.
The image may be that of a source of the predominant recorded sound in the oriented channel, or of a scene associated with the recorded sound, such as a tree for a forest sound, a car for a city sound, a wave for a beach sound, a video of crashing water in a water fall, a video of crashing waves at a beach, a video of trees moving in the wind, and the like.
You could learn about the details in Apple's patent application 20200221248 that was published today by the U.S. Patent Office was filed back in Q1 2020 or a little over three months prior to introducing Spatial Audio at this year's WWDC.
Comments