On April 3, 2014, the US Patent & Trademark Office published a patent application from Apple titled "System and Method of Detecting a User's Voice Activity using an Accelerometer." It appears that Apple is working to advance their EarPod with Remote and Mic headset so as to provide superior voice quality for phone calls in the future. Technically speaking, the new system will have a voice activity detector (VAD) system that uses signals from an accelerometer included in the earbuds of a headset with a microphone array to detect the user's speech and to steer at least one or more beamformers.
Apple's Patent Background
Currently, a number of consumer electronic devices are adapted to receive speech via microphone ports or headsets. While the typical example is a portable telecommunications device (mobile telephone), with the advent of Voice over IP (VoIP), desktop computers, laptop computers and tablet computers may also be used to perform voice communications.
When using these electronic devices, the user also has the option of using the speakerphone mode or a wired headset to receive his speech. However, a common complaint with these hands-free modes of operation is that the speech captured by the microphone port or the headset includes environmental noise such as secondary speakers in the background or other background noises. This environmental noise often renders the user's speech unintelligible and thus, degrades the quality of the voice communication.
Similarly, when these electronic devices are used in a non-speaker phone mode which requires the user to hold the electronic device's earphone portion to the user's ear ("at ear position"), the speech that is captured by the microphone port may also be rendered unintelligible due to environmental noise.
Apple to Advance EarPod Voice Quality with new Mic System using an Accelerometer
Apple's invention generally relates to using signals from an accelerometer included in an earbud of an enhanced headset like their EarPod headset, for use with electronic devices to detect a user's voice activity.
Being placed in the user's ear canal, the accelerometer may detect speech caused by the vibrations of the user's vocal chords. Using these signals from the accelerometer in combination with the acoustic signals received by microphones in the earbuds and a microphone array in the headset wire, a coincidence defined as a "AND" function between a movement detected by the accelerometer and the voiced speech in the acoustic signals may indicate that the user's voiced speech is detected.
Unvoiced Speech Detection
Apple notes that when a coincidence is obtained, a voice activity detector (VAD) output may indicate that the user's voiced speech is detected. In addition to the user's voiced speech, the user's speech may also include unvoiced speech, which is speech that is generated without vocal-chord vibrations (e.g., sounds such as /s/, /sh/, /V). In order for the VAD output to indicate that unvoiced speech is detected, a signal from a microphone in the earbuds or a microphone in the microphone array or the output of a beamformer may be used.
A high-pass filter is applied to the signal from the microphone or beamformer and if the resulting power is above a threshold, the VAD output may indicate the user's unvoiced speech is detected.
New Noise Suppressor
Apple further notes that a noise suppressor may receive the acoustic signals as received from the microphone array beamformer and may suppress the noise from the acoustic signals or beamformer based on the VAD output. Further, based on this VAD output, one or more beamformers may also be steered such that the microphones in the earbuds and in the microphone array emphasize the user's speech signals and deemphasize the environmental noise.
Adding a New Accelerometer Aspect to EarPods
According to Apple, in another embodiment of the invention, a method of detecting a user's voice activity in a headset with a microphone array starts with a voice activity detector (VAD) generating a VAD output based on (i) acoustic signals received from microphones included in a pair of earbuds and the microphone array included on a headset wire and (ii) data output by a sensor detecting movement that is included in the pair of earbuds. The headset may include the pair of earbuds and the headset wire.
The VAD output may be generated by detecting speech included in the acoustic signals, detecting a user's speech vibrations from the data output by the accelerometer, coincidence of the detected speech in acoustic signals and the user's speech vibrations, and setting the VAD output to indicate that the user's voiced speech is detected if the coincidence is detected and setting the VAD output to indicate that the user's voiced speech is not detected if the coincidence is not detected. A noise suppressor may then receive (i) the acoustic signals from the microphone array and (ii) the VAD output and suppress the noise included in the acoustic signals received from the microphone array based on the VAD output. The method may also include steering one or more beamformers based on the VAD output. The beamformers may be adaptively steered or the beamformers may be fixed and steered to a set location.
In another embodiment of the invention, a system detecting a user's voice activity comprises a headset, a voice activity detector (VAD) and a noise suppressor. The headset may include a pair of earbuds and a headset wire. Each of the earbuds may include earbud microphones and a sensor detecting movement such as an accelerometer. The headset wire may include a microphone array. The VAD may be coupled to the headset and may generate a VAD output based on (i) acoustic signals received from the earbud microphones, the microphone array or beamformer and (ii) data output by the sensor detecting movement. The noise suppressor may be coupled to the headset and the VAD and may suppress noise from the acoustic signals from the microphone array based on the VAD output.
In another embodiment of the invention, a method of detecting a user's voice activity in a mobile device starts with a voice activity detector (VAD) generating a VAD output based on (i) acoustic signals received from microphones included in the mobile device and (ii) data output by an inertial sensor that is included in an earphone portion of the mobile device, the inertial sensor to detect vibration of the user's vocal chords modulated by the user's vocal tract based on based on vibrations in bones and tissue of the user's head. In this embodiment, the inertial sensor being located in the earphone portion of the mobile device may detect the vibrations being detected at the user's ear or in the area proximate to the user's ear.
Apple's patent FIG. 1 illustrates an example of the headset in use; FIG. 2 illustrates an example of the right side of the headset used with a consumer electronic device that includes an accelerometer.
As shown in FIG. 16 below, when the VAD output is set to 1, the first beamformer s adaptively steered such that the main beam is directed towards the user's mouth and maintained in that direction when the VAD output is set to 0.
According to Apple, a third beamformer detects the directions of the main environment noise locations when the VAD output is set to 0. Using the directions detected by the third beamformer, the nulls of the first beamformer are adaptively steered in these directions of the main environment noise locations. Accordingly, the first beamformer emphasizes the user's speech using the main beam and deemphasizes the noise locations using the nulls.
Apple's patent FIG. 14 illustrates a block diagram of a system detecting a user's voice activity.
According to Apple, in one embodiment, the microphone and speaker ports may be coupled to the communications circuitry to enable the user to participate in wireless telephone. As further illustrated in patent FIG. 22, the iPhone may include an inertial sensor that is included in an earphone portion of the iPhone.
The inertial sensor may be an accelerometer that detects vibration of the user's vocal chords modulated by the user's vocal tract based on vibrations in bones and tissue of the user's head.
In one embodiment, the accelerometer has a sampling rate greater than 2000 Hz. In another embodiment, the sampling rate of the accelerometer may be between 2000 Hz and 6000 Hz.
Apple credits Sorin Dusan, Esge Andersen, Aram Lindahl and Andrew Bright as the inventors of patent application 20140093093 which was originally filed in Q1 2013. Considering that this is a patent application, the timing of such a product to market is unknown at this time.
Note for Tech Sites covering our Report: We ask tech sites covering our report to kindly limit the use of our graphics to one image. Thanking you in advance for your cooperation.
Patently Apple presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. Revelations found in patent applications shouldn't be interpreted as rumor or fast-tracked according to rumor timetables. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Comments are reviewed daily from 4am to 8pm and sporadically on the weekend.