Apple Files an iPhone based Speech-to-Text Solution Patent
In a May 2009 Apple Discussions Forum titled "Speech to Text Yet?" – a user asked this question: I simply want to enter ISBN numbers from books into a word processing window by speaking them. Is there anything available yet for Macintosh that allows me to do this simple task? Well, it appears that Apple has something in the works on that very front, indeed. Today, the US Patent & Trademark Office published a patent application of Apple's which generally relates to deriving text data from speech data. Some software solutions today enable the user to enter text data using speech. These software solutions convert the speech to text using speech recognition engines. However, these software solutions can be difficult to use when entering symbolic characters, style or typeface input because they typically require escape sequences to exit a speech input mode and then additional input to return to speech input mode. Apple's solution involves utilizing both a speech recognition module and a text composition module using an iPhone.
A Speech to Text Example
As shown in FIG.1 above (click to enlarge), the editing interface stems from the iPhone which includes a virtual keyboard which can be in English and/or any number of foreign languages. In the depicted example, the user can select a "begin speech input" selection 150 to enable the iPhone to receive speech input. After the begin speech input selection is selected, the iPhone can receive speech data from the microphone, noted at the bottom of the iPhone as 160. In some implementations, the speech input can be processed in real-time. In other implementations, the speech input can be recorded for subsequent processing.
As shown in FIG. 1B, the input window 130 includes an example message to Sophia. As an illustrative example, to input the depicted message, the user began by speaking "Sophia" followed by a selection of the comma character representation (",") and two sequential carriage return character representations. Next, the user spoke "can you go to the store after work to pick up" and then selected the colon character representation (":"), two sequential carriage return character selections, a dash character representation ("-") and a space character representation. After inputting the space character representation, the user in this example continued by speaking "milk" followed by one carriage return character representation. After inputting another dash character followed by a space character, the user spoke "salmon" and selected two sequential carriage return character representations. Next, the user spoke "remember" and then selected a comma and a space character representation. Next, the user enabled the bold style using the style selections 190. Because the bold style is selected, the sentence spoken by the user "John and Jane are coming over tonight" is displayed in bold. The user then deselected the bold style by selecting the bold selection representation. The user then selected a comma and a space character representation and spoke "so you need to be back by." The user then selected a tilde character representation, a six character representation, a colon character representation, a three character representation, a zero character representation and an exclamation point character representation using the virtual keyboard. The user can then select a "stop speech input" selection representation 170, which can return the user to a non-speech text editing interface (e.g., interface 110 of FIG. 1A). In some implementations, the user can edit the text displayed in the input window 130 using the virtual keyboard 140.
Thus in the example above the user entered speech and non-speech input during the speech input session. The speech and non-speech input were then processed and combined to provide input to a currently selected application (e.g., electronic mail). The input did not require that the user speak or input any special phrases or keystrokes to access non-speech characters, or any subsequent editing to insert the non-speech characters into the text data derived from the speech data.
Apple's patent FIG. 4 is a block diagram of example editing interface instructions for communicating with a speech to text composition server.
Apple credits Kazuhisa Yanagihara as the sole inventor of patent application 20090216531.
NOTICE: Patently Apple presents only a brief summary of patents with associated graphic(s) for journalistic news purposes as each such patent application and/or grant is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application and/or grant should be read in its entirety for further details. For additional information on today's patent(s), simply feed the individual patent number(s) noted above into this search engine.
Comments