Apple invents a new AirPods feature that allows 2-people in a busy environment to have a conversation and clearly hear each other
In a new patent application published today Apple notes that a user having a conversation with someone nearby in a noisy environment, such as a restaurant, bar, airplane, or a bus, may find it difficult to hear and understand the other person. A solution that may reduce this effort is to wear headphones that passively isolate the wearer from the noisy environment but also actively reproduce the other person's voice through the headphone's speaker. This is referred to as a transparency mode of operation.
A Conversation Detector for AirPods
An aspect of Apple’s invention is a signal processing technique referred to as a conversation detector or conversation detect process. The conversation detector is a digital signal processing technique that operates upon one or more external microphone signals of the headphone, and perhaps one more other sensor signals such as produced by an audio accelerometer or bone conduction sensor, to decide when to activate or trigger a transparency mode of operation, and it ideally should be active only during an actual conversation between a wearer of the headphone and another talker in the same ambient environment.
The talker (referred to here as “other talker”) is a person who is nearby for instance within two meters of the headphone wearer. The other talker may be standing next to or sitting across a table or side by side, for instance in a dining establishment, in the same train car, or in the same bus as the wearer.
In one aspect, the transparency mode activates a conversation-focused transparency signal processing path (C-F transparency) in which one or more of the microphone signals of the headphone are processed to produce a conversation-focused transparency audio signal which is input to a speaker of the headphone.
The conversation detector may declare the conversation has ended more accurately than relying solely on the absence of own voice activity. To declare the conversation ended, the conversation detector may implement an own voice activity detector, OVAD, and a target voice activity detector, TVAD whose inputs are one or more of the microphone signals and when available one or more other sensor signals. The OVAD and the TVAD detect own-voice activity (the wearer is talking) and far-field target voice activity (the other talker is speaking.)
The conversation detector monitors a duration in which the OVAD and the TVAD are both or simultaneously indicating no activity and may declare the end of the conversation in response to the duration being longer than an idle threshold.
The conversation detector thus helps not only reduce power consumption, which is particularly relevant in wireless headphones, but also reduce the instances of distortion that might be introduced by the conversation-focused transparency signal processing path. It can advantageously prevent the mode being activated in unsuitable situations.
A filter block produces the conversation-focused transparency audio signal by enhancing or isolating the speech of the other talker. It may be performed in many ways, e.g., by processing two or more external microphone signals (from two or more external microphones, respectively) of the headset using sound pickup beamforming to perform spatially selective sound pick up in a primary lobe having an angular spread of less 180 degrees in front of the wearer. It may be performed using knowledge based statistical or deterministic algorithms, or it may be performed using data driven techniques such as machine learning (ML) model processing, or any combination of the above.
In one aspect, when the conversation detector declares an end to the conversation, then at that point the transparency mode is deactivated. That means, for example, deactivating the conversation-focused transparency audio signal. In one aspect, the transparency mode is deactivated by also activating an anti-noise signal (or by raising selected frequency-dependent gains of, or raising the scalar gain of, the anti-noise signal.) In other aspects, entering and exiting the transparency mode during media playback (e.g., music playback, movie soundtrack playback) changes how the media playback signal is rendered.
Apple’s patent FIG. 1 below depicts an example of a headphone wearer who is trying to listen to another talker in their ambient sound environment; FIG. 2 is a block diagram of a headphone having digital audio signal processing that implements a transparency mode of operation for the headphone in which the wearer's speech listening effort is advantageously reduced; FIG. 5 illustrates how an example conversation detector declares a conversation based on matching target speaker identification models that have been produced for respective portions of a microphone signal.
Apple’s patent FIGS. 7a-7d above illustrate how a sound pick up aperture of a conversation detector expands and shrinks in response to yaw angle changes in the headphone wearer's head.
Apple's patent application 20240365040 that was published today, was originally filed on March 29, 2024.