Apple Reveals Binaural Video Recording to provide iPhones with 3D Sound Capture extending to VR Experiences
On April 30th the US Patent & Trademark Office published a patent application from Apple that's about a new 3D audio dimension for video recording that's coming to a future iPhone. This kind of audio has been available via Sennheiser AMBEO Smart Headphones for iPhones and received a resoundingly positive review by Unbox Therapy back in late 2017. It caught the attention of 4.4 audio fans back then.
Although Apple's patent is not about earphones with 3D Audio, the video does make you wonder if 3D audio could later be integrated into a future version of AirPods Pro, or better yet, their first over-the-ear Headphones that have been rumored to be on their way to market later this year.
In the big picture, Apple's patent application 20200137489 titled "Spatially biased sound pickup for binaural video recording," covers Binaural recording of audio that facilitates a means for full 3D sound capture--in other words, being able to reproduce the exact sound scene and giving the user a sensation of 'being there.'
Apple states that full 3D sound capture can be accomplished through spatial rendering of audio inputs using Head Related Transfer Functions (HRTF), which modifies a sound signal in order to induce the perception in a listener that the sound signal is originating from any point in space.
While this approach is compelling, for example, full virtual reality applications, in which a user can interact both visually and audibly in a virtual environment, in traditional video capture applications three dimensional sounds can distract the viewer from the screen.
In contrast, monophonic or traditional stereophonic recordings may not provide a sufficient sense of immersion.
An aspect of Apple's invention covers a method for producing a spatially biased sound pickup beamforming function, to be applied to a multi-channel audio recording of a video recording.
The method includes generating a target directivity function. The target directivity function includes a set of spatially biased head related transfer functions.
A left ear set of beamforming coefficients and a right ear set of beamforming coefficients may be generated by determining a best fit for the target directivity function based on a device steering matrix. The left ear set of beamforming coefficients and the right ear set of beamforming coefficients may then be output and applied to the multichannel audio recording to produce more immersive sounding, spatially biased audio for the video recording.
Another aspect of the invention is directed towards a method for producing the target directivity function, which includes a set of spatially biased HRTFs. The method includes selecting a set of left ear and right ear head related transfer functions (HRTFs).
The left ear and right ear head HRTFs are multiplied with an on-camera emphasis function (OCE), to produce the spatially biased HRTFs. The OCE may be designed to modify the sound profile of the HRTFs to provide emphasis in one or more desired directions, e.g., directly ahead where the camera is being aimed, as a function of the orientation of the recording device when the device is recording video in a specific orientation.
Apple's patent FIG. 1 below depicts a future iPhone (a multimedia recording device) during use; FIG. 5 illustrates front camera and rear camera orientations of a future iPhone.
More specifically, patent FIG. 1 above is an iPhone which doubles as a multimedia recording device #100. The iPhone simultaneously records from a built-in free-field microphone array #133 (composed of several individual microphones #107), and from one of its two built-in cameras, first camera #103 or second camera #106 that is illustrated in patent FIG. 5.
The microphone array and the cameras have been strategically placed on the housing of the iPhone. Thereafter, when performing a playback of the recorded audio-video with spatial sound rendering of the multichannel audio, the listener is able to (using perceived, small differences in timing and sound level introduced by the spatial sound rendering process) derive roughly the positions of the sound sources, thereby enjoying a sense of space. Thus, the voice of the person being interviewed would be perceived as coming directly from the playback screen, while the voices of others in the scene or the sounds of cars in the scene would be perceived as coming from their respective directions.
Further, a more compelling cinematic experience can be obtained where the audio recording is given a spatial profile (by the spatial sound rendering process) that better matches the spatial focus of the audio-video recording.
In the example of FIG. 1, this means that the voices of others in the scene and of other ambient sounds that were captured (such as cars or buses) should be spatially rendered but in such a way that enables the listener to focus on the voice of the interviewee.
Apple's patent FIG. 2 below is a diagram of an audio system for outputting spatially biased beamforming coefficients that are applied to multichannel audio pickup from a future iPhone; FIG. 3 illustrates a flow diagram of a process for generating spatially biased beamforming coefficients.
Considering that this is a patent application, the timing of such a product to market is unknown at this time.