Apple reveals that cameras could be integrated into future versions of AirPods Max & Apple TV to ascertain virtual speaker placement in a room
Today the U.S. Patent and Trademark Office officially granted Apple a patent that relates to audio processing with virtual speakers. More specifically, Apple's patent reveals that in the future, cameras could be integrated into both AirPods Max and the Apple TV box so as to ascertain the appropriate positioning of virtual speakers within a given room.
Apple's granted patent, that was never published as a patent application, states that speakers can be virtualized through playback on a headphone set. For example, if a user watches a movie with a headphone set on, movie audio that is played through the headphone set can be virtualized so that the user perceives sound to be coming from virtualized speakers with set positions located around the user.
Locations of the virtual speakers can be tailored to a user's setup, for example, a television size, television location, and listening area (which can include geometry of a room that the television is located). An estimated location of the user can also factor into where the speakers are placed. For example, if a user's sitting position can be estimated, then virtual speakers that might be dedicated to surround sound can be placed at the user's side or behind the user. Based on analysis of such factors, locations of virtual speaker locations can be assigned and/or optimized in a manner that provides a positive experience to the user.
In one aspect, a method of virtualizing speakers (e.g., for playback on a headphone set) can include: determining a location of a television; assigning one or more locations of one or more virtual speakers based on the location of the television, wherein the one or more virtual speakers include a first virtual speaker located at the television; determining, in real-time (e.g., continuously and concurrently with the playback of the spatialized audio signals), a position of a head of a user; and spatializing, based on the position of the head and the one or more locations of the one or more virtual speakers, one or more audio signals with a spatial renderer to generate spatialized audio signals that, when used to drive a left speaker and a right speaker of a headphone set, are converted to sound that is perceived by the user to be located at the one or more locations of the one or more virtual speakers.
An audio system (#30) is shown for virtualizing speakers through a headset in patent FIG. 2 below. A sensor (e.g., one or more cameras #36) can generate image data (#34) that can includes images of a television (#35) and the environment of the television (e.g., a room, a backyard, etc.). An image processor (#32) can, using computer vision technologies, recognize the television and the environment, and generate a map (#33) of the television in the environment.
A virtual layout generator (#38) can use the mapping of the television and the environment to assign locations to one or more virtual speakers within the environment. At least one of those virtual speakers can be located at the television so that the user can hear sounds coming from the television, thereby providing a natural listening experience.
Headphone sensor #47 (e.g., one or more cameras) can generate image data (#42) that can be processed by a tracking processor (#44) to track a position of a user's head. Computer vision and known tracking algorithms can be used by the tracking processor to track the user's head. The tracking processor can use the mapping of the TV and TV environment as a reference to track the user's head.
For example, if the TV is within view of the sensor and contained in the image, the location and angle of the television can provide a reference to determine the position of the user's head. Other objects or patterns recognized in the image data can also be used as reference.
A spatial renderer #50 can spatialize audio signals received from an audio content source (#40 Apple TV, #90 in FIG. 5). The content source can be a media player, a media server, a computing device, or other content providing means.
In one aspect, the audio signals can be upmixed or downmixed by a mixer (#41) For example, one or more audio signals from the audio content source can be mixed to a desired audio format, for example, 5.1 surround, 7.1 surround, or other configurations.
In one aspect, the process shown in FIG. 1 above can be repeated for multiple users in the same listening area. For example, separate sets of one or more virtual speakers can be generated and assigned to multiple users, each wearing a headphone set.
The assignment of locations of the virtual speakers can be the same, or different from one user to the other. In other words, one of the one or more virtual speakers of a first user can have a location that is different from any and all of the one or more locations of the one or more virtual speakers of a second user.
For example, FIG. 3 below shows user #1 that can be listening to virtual speakers 63, 62, 64, 65 and 66 which have been generated for user 1. A second user (user #2) can have assigned a separate set of virtual speakers having the same location (e.g., 63, 62, and 64) but also having different assigned locations such as speakers 67 and 69. The heads of users 1 and 2 can be tracked independently to continuously update the spatializing of the audio signals of user 1 and 2, independently.
Apple's patent FIG. 5 above illustrates system hardware such as AirPods Max headphones, iPhone/iPad, Television, Apple TV box. Apple notes that the definition of a "television" can be expanded beyond a standalone TV.
"A 'television' shall be regarded as interchangeable with a laptop having a display, a tablet computer, a projected display projected onto a surface by a projector, a computer monitor, or other devices with display means. All aspects discussed with regard to a 'television' also apply to these other forms of 'television.'"
Apple's patent FIG. 5 also reveals that future versions of both AirPods Max and Apple TV could integrate cameras so as to ascertain where the best positioning of virtual speakers in room should be.
For more details, review Apple's granted patent 11,432,095.
In the future, it would appreciated if AirPods Max allowed two people in a room watching a show on TV to be able to communicate with each other without having to pause the show or remove the headphones. Intercommunication between headphones would be a very appreciated feature.