Apple introduces us to Siri, the Killer Patent
On October 4, 2011, Apple launched the iPhone 4S with Siri just one day prior to Steve Jobs passing. Today, the first killer patent application behind Siri was published by the US Patent and Trademark Office. It's clear that Apple's breakthrough technology is destined to go far beyond the iPhone and into devices like the iMac and a future HDTV. The timing of this patent application is appropriate, being that we just posted a report on Tuesday titled "Steve Jobs Credited with an Apple TV Patent for Episodic TV." The patent also reveals that Apple envisions the technology playing a role in vehicles and in-vehicle entertainment systems where an Intelligent Assistant will be considered the king of user interfaces. Apple's patent shows us that Siri will be able to be configured to work with various new scenarios and even act as an instructor when we purchase future devices. Forget using a manual – as Siri will simply teach us what we'll want to know about our new devices when we're ready to ask it a question about a new function or feature. Today we get a look behind the magic of Siri, and it is simply mind boggling. Report Updated, 2:45 PM MST: Siri Trademark filing information added.
Before Siri, There was Apple's 1987 Knowledge Navigator Concept
According to Wikipedia, Apple's Knowledge Navigator Concept of 1987 described "a device that could access a large networked database of hypertext information, and use software agents to assist searching for information.
Apple produced several concept videos showcasing the idea and one of them is presented below. All of videos featured a tablet style computer with numerous advanced capabilities, including an excellent text-to-speech system with no hint of "computerese", a gesture based interface resembling the multitouch interface now used on the iPhone and an equally powerful speech understanding system, allowing the user to converse with the system via an animated "butler" as the software agent.
One of the inventors of Siri noted on today's patent application is Tom Gruber. In his interview with TechCrunch's Nova Spivack a year ago, he was asked the question "What are some of the examples that have influenced your thinking?
In part, Gruber stated that "the idea of interacting with a computer via a conversational interface with an assistant has excited the imagination for some time. Apple's famous Knowledge Navigator video offered a compelling vision, in which a talking head agent helped a professional deal with schedules and access information on the net."
It's clear that Apple had this vision of the intelligent assistant for some time and waaaaaay ahead of devices like Microsoft's Kinect.
Apple's Patent Background
Today's patent report begins with Apple's patent background. According to Apple, electronic devices are able to access a large, growing, and diverse quantity of functions, services, and information, both via the Internet and from other sources. Functionality for such devices is increasing rapidly, as many consumer devices, smartphones, tablet computers, and the like, are able to run software applications to perform various tasks and provide different types of information. Often, each application, function, website, or feature has its own user interface and its own operational paradigms, many of which could be burdensome to learn or overwhelming for users. In addition, many users may have difficulty even discovering what functionality and/or information is available on their electronic devices or on various websites; thus, such users may become frustrated or overwhelmed, or may simply be unable to use the resources available to them in an effective manner.
In particular, novice users, or individuals who are impaired or disabled in some manner, and/or are elderly, busy, distracted, and/or operating a vehicle may have difficulty interfacing with their electronic devices effectively, and/or engaging online services effectively. Such users are particularly likely to have difficulty with the large number of diverse and inconsistent functions, applications, and websites that may be available for their use.
Accordingly, existing systems are often difficult to use and to navigate, and often present users with inconsistent and overwhelming interfaces that often prevent the users from making effective use of the technology.
Apple's Solution: Siri
Apple's invention relates to an intelligent automated assistant implemented on an electronic device, to facilitate user interaction with a device, and to help the user more effectively engage with local and/or remote services. In various embodiments, the intelligent automated assistant engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
The Conversation Interface
According to various embodiments of the present invention, the intelligent automated assistant integrates a variety of capabilities provided by different software components (e.g., for supporting natural language recognition and dialog, multimodal input, personal information management, task flow management, orchestrating distributed services, and the like). Furthermore, to offer intelligent interfaces and useful functionality to users, the intelligent automated assistant of the present invention may, in at least some embodiments, coordinate these components and services. The conversation interface, and the ability to obtain information and perform follow-on task, are implemented, in at least some embodiments, by coordinating various components such as language components, dialog components, task management components, information management components and/or a plurality of external services.
Siri is Configurable
According to Apple, intelligent automated assistant systems may be configured, designed, and/or operable to provide various different types of operations, functionalities, and/or features, and/or to combine a plurality of features, operations, and applications of an electronic device on which it is installed. In some embodiments, the intelligent automated assistant systems of the present invention could perform any or all of: actively eliciting input from a user, interpreting user intent, disambiguating among competing interpretations, requesting and receiving clarifying information as needed, and performing (or initiating) actions based on the discerned intent.
Siri, the Teacher
Actions could be performed, for example, by activating and/or interfacing with any applications or services that may be available on an electronic device, as well as services that are available over an electronic network such as the Internet. In various embodiments, such activation of external services could be performed via APIs or by any other suitable mechanism. In this manner, the intelligent automated assistant systems of various embodiments of the present invention could unify, simplify, and improve the user's experience with respect to many different applications and functions of an electronic device, and with respect to services that may be available over the Internet. The user could thereby be relieved of the burden of learning what functionality may be available on the device and on web-connected services, how to interface with such services to get what he or she wants, and how to interpret the output received from such services; rather, the assistant of the present invention could act as a go-between between the user and such diverse services.
Siri's Short and Long Term Memory
In addition, in various embodiments, the assistant of the present invention provides a conversational interface that the user may find more intuitive and less burdensome than conventional graphical user interfaces. The user could engage in a form of conversational dialog with the assistant using any of a number of available input and output mechanisms, such as for example speech, graphical user interfaces (buttons and links), text entry, and the like. The system could be implemented using any of a number of different platforms, such as device APIs, the web, email, and the like, or any combination thereof. Requests for additional input could be presented to the user in the context of such a conversation. Short and long term memory could be engaged so that user input could be interpreted in proper context given previous events and communications within a given session, as well as historical and profile information about the user.
Understanding the Context of a Conversation
In addition, in various embodiments, context information derived from user interaction with a feature, operation, or application on a device could be used to streamline the operation of other features, operations, or applications on the device or on other devices. For example, the intelligent automated assistant could use the context of a phone call (such as the person called) to streamline the initiation of a text message (for example to determine that the text message should be sent to the same person, without the user having to explicitly specify the recipient of the text message). The intelligent automated assistant of the present invention could thereby interpret instructions such as "send him a text message", wherein the "him" is interpreted according to context information derived from a current phone call, and/or from any feature, operation, or application on the device. In various embodiments, the intelligent automated assistant takes into account various types of available context data to determine which address book contact to use, which contact data to use, which telephone number to use for the contact, and the like, so that the user need not re-specify such information manually.
Siri Could Work with External E-Commerce, Local and Travel Services
In various embodiments, the assistant could also take into account external events and respond accordingly, for example, to initiate action, initiate communication with the user, provide alerts, and/or modify previously initiated action in view of the external events. If input is required from the user, a conversational interface could again be used.
In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system could interact. In various embodiments, these external services include web-enabled services, as well as functionality related to the hardware device itself. For example, in an embodiment where the intelligent automated assistant is implemented on a smartphone, personal digital assistant, tablet computer, or other device, the assistant could control many operations and functions of the device, such as to dial a telephone number, send a text message, set reminders, add events to a calendar, and the like.
In various embodiments, the system of the present invention could be implemented to provide assistance in any of a number of different domains. Examples include: Local Services (including location- and time-specific services such as restaurants, movies, automated teller machines (ATMs), events, and places to meet); Personal and Social Memory Services (including action items, notes, calendar events, shared links, and the like); E-commerce (including online purchases of items such as books, DVDs, music, and the like); Travel Services (including flights, hotels, attractions, and the like).
Automating the Application of Data and Services
In various embodiments, the intelligent automated assistant systems may be configured or designed to include functionality for automating the application of data and services available over the Internet to discover, find, choose among, purchase, reserve, or order products and services.
In addition to automating the process of using these data and services, at least one intelligent automated assistant system embodiment disclosed herein may also enable the combined use of several sources of data and services at once. For example, it may combine information about products from several review sites, check prices and availability from multiple distributors, and check their locations and time constraints, and help a user find a personalized solution to their problem.
Additionally, at least one intelligent automated assistant system described in the patent may be configured or designed to include functionality for automating the use of data and services available over the Internet to discover, investigate, select among, reserve, and otherwise learn about things to do (including but not limited to movies, events, performances, exhibits, shows and attractions); places to go (including but not limited to travel destinations, hotels and other places to stay, landmarks and other sites of interest, etc.); places to eat or drink (such as restaurants and bars), times and places to meet others, and any other source of entertainment or social interaction which may be found on the Internet.
Additionally, at least one intelligent automated assistant system configuration disclosed in the patent may be configured or designed to include functionality for enabling the operation of applications and services via natural language dialog that may be otherwise provided by dedicated applications with graphical user interfaces including search (including location-based search); navigation (maps and directions); database lookup (such as finding businesses or people by name or other properties); getting weather conditions and forecasts, checking the price of market items or status of financial transactions; monitoring traffic or the status of flights; accessing and updating calendars and schedules; managing reminders, alerts, tasks and projects; communicating over email or other messaging platforms; and operating devices locally or remotely (e.g., dialing telephones, controlling light and temperature, controlling home security devices, playing music or video, etc.).
Personalized Recommendations
Further, at least one intelligent automated assistant system could be configured or designed to include functionality for identifying, generating, and/or providing personalized recommendations for activities, products, services, source of entertainment, time management, or any other kind of recommendation service that benefits from an interactive dialog in natural language and automated access to data and services.
Siri, Initiating and Controlling iOS Operations on Your Devices
In various embodiments, the intelligent automated assistant of the present invention could control many features and operations of an electronic device. For example, the intelligent automated assistant could call services that interface with functionality and applications on a device via APIs or by other means, to perform functions and operations that might otherwise be initiated using a conventional user interface on the device. Such functions and operations may include, for example, setting an alarm, making a telephone call, sending a text message or email message, adding a calendar event, and the like. Such functions operations may be performed as add-on functions in the context of a conversational dialog between a user and the assistant. Such functions and operations could be specified by the user in the context of such a dialog, or they may be automatically performed based on the context of the dialog. One skilled in the art will recognize that the assistant could thereby be used as a control mechanism for initiating and controlling various operations on the electronic device, which may be used as an alternative to conventional mechanisms such as buttons or graphical user interfaces.
Conceptual Architecture
Apple's patent FIG. 1, there is shown a simplified block diagram of a specific example embodiment of an intelligent automated assistant 1002. As noted earlier, different embodiments of intelligent automated assistant systems may be configured.
Apple's patent FIG. 6 below is a block diagram depicting a system architecture illustrating several different types of clients and modes of operation which interestingly includes future use with car navigation systems, voice control systems and in-car entertainment systems.
That's kind of interesting considering that a new patent application from Japan's Honda surfaced today revealing a new Human-Machine Interface System for their car consoles that clearly intend to work with Apple's iPod/iPhone devices. Honda states that their invention will allow vehicle operators to control a number of components within the vehicle via a single human-machine interface. The system, on one level, will offer a touch screen panel. Further into the patent, they reveal a voice control feature in a second layer of their new console system. Hmm, I wonder if Apple is talking to Honda about integrating Siri! Time will tell, said smilingly. (See Honda patent application 20120013548).
Active Ontology
Apple's patent FIG. 8 shown below covers "Active Ontology" involving representations of a restaurant and meal event. In this example, a restaurant is a concept with properties such as the name of cuisine it serves, its location, which in turn might be modeled as a structured node with properties for street address. The concept of a meal event might be modeled as a node including a dining party which has a particular size and time period. Active ontologies may include and/or make reference to domain models. This, along with the example of travel services is also supported in Apple's trademark filing for Siri which Patently Apple discovered in Hong Kong's database this afternoon.
Apple's patent FIG. 8 depicts an event planning task flow model which models the planning of events independent of domains, applied to a domain-specific kind of event.
Apple's patent FIG. 9 illustrates an example of an alternative embodiment of intelligent automated assistant system wherein domain models, vocabulary, language pattern recognizers, short term personal memory, and long term personal memory components are organized under a common container associated with active ontology and other components such as active input elicitation component(s), language interpreter, and dialog flow processor are associated with active ontology via API relationships.
Multimodal Active Input Elicitation
Apple's patent FIG. 26 shown below illustrates a flow diagram depicting a method for multimodal active input elicitation. Inputs may be received concurrently from one or more or any combination of the input modalities, in any sequence. Thus, the method includes actively eliciting typed input, speech input, GUI-based input, input in the context of a dialog, and/or input resulting from event triggers.
Any or all of these input sources are unified into a unified input format and returned. Unified input format 2690 enables the other components of the intelligent automated assistant to be designed and to operate independently of the particular modality of the input.
Offering active guidance for multiple modalities and levels enables constraint and guidance on the input beyond those available to isolated modalities. For example, the kinds of suggestions offered to choose among speech, text, and dialog steps are independent, so their combination is a significant improvement over adding active elicitation techniques to individual modalities or levels.
Combining multiple sources of constraints as described herein (syntactic/linguistic, vocabulary, entity databases, domain models, task models, service models, and the like) and multiple places where these constraints may be actively applied (speech, text, GUI, dialog, and asynchronous events) provides a new level of functionality for human-machine interaction.
Multiphase Output Procedure
Apple's patent FIG. 39 shown below illustrates a flowchart depicting an example of a multiphase output procedure according to one embodiment. The multiphase output procedure includes the automated assistant processing and multiphase output steps.
In step 710, a speech input utterance is obtained and a speech-to-text component (such as component described in connection with FIG. 22) interprets the speech to produce a set of candidate speech interpretations 712. In one embodiment, speech-to-text component is implemented using, for example, Nuance Recognizer, available from Nuance Communications, Inc. of Burlington, Mass. Candidate speech interpretations 712 may be shown to the user in 730, for example in paraphrased form. For example, the interface might show "did you say?" alternatives listing a few possible alternative textual interpretations of the same speech sound sample.
In at least one embodiment, a user interface is provided to enable the user to interrupt and choose among the candidate speech interpretations. In step 714, the candidate speech interpretations are sent to a language interpreter 1070 (of FIG. 9 above), which may produce representations of user intent 716 for at least one candidate speech interpretation. In step 732, paraphrases of these representations of user intent are generated and presented to the user.
In at least one embodiment, the user interface enables the user to interrupt and choose among the paraphrases of natural language interpretations.
Multimodal Output Processing
Apple's patent FIG. 42 is a flowchart depicting an example of multimodal output processing according to one embodiment.
The method begins 600. The output processor takes uniform representation of the response and formats of the response according to the device and modality that is appropriate and applicable. Step 612 may include information from device and modality models and/or domain data models.
Once the response has been formatted, any of a number of different output mechanisms could be used, in any combination. Examples depicted in FIG. 42 include: Generating text message output, which is sent to a text message channel; Generating email output, which is sent as an email message; Generating GUI output, which is sent to a device or web browser for rendering; Generating speech output, which is sent to a speech generation module.
Some of the Major Segments Listed in Apple's Invention
To give you an idea of the scope of this invention, here are some of the segments covered in Apple's specification: Hardware Architecture, Conceptual Architecture, User Interaction, Intelligent Automated Assistant Components, Active Ontologies, Active Input Elicitation Components, Active Typed Input Elicitation, Active Speech Input Elicitation, Active GUI-Based Input Elicitation, Active Dialog Suggestion Input Elicitation, Active Monitoring for Relevant Events, Multimodal Active Input Elicitation, Domain Models Components, Language Interpreter Components, Domain Entity Databases, Vocabulary Components, Language Pattern Recognizer Components, Dialog Flow Processor Components, Task Flow Components, Services Orchestration Components, Output Processor Components, Short and Long Term Memory Components, Automated Call Response Procedure, Conceptual Data Model, Filtering and Sorting Results, Precedence Ordering, Paraphrase and Prompt Text and Prompts when Users Click on Active Links.
Apple Devices that could soon use Siri According to Apple's Patent
Apple's patent application lists a great number of devices beyond the iPhone that Siri may service in the future. They include, the iPod touch (a personal digital assistant), iMac (desktop computer), MacBook (laptop computer), iPad (tablet computer), consumer electronic devices, consumer entertainment devices; iPod (music player); camera; television; Apple TV (set-top box); electronic gaming unit; kiosk or the like.
An electronic device for implementing the present invention may use any operating system such as, for example, iOS or Mac OSX, available from Apple.
Apple's patent application 20120016678 was originally filed in Q2 2010 by Siri's inventors Thomas Gruber, Adam Cheyer, Dag Kittlaus, Didier Guzzoni, Christopher Brigham, Richard Giuli, Marcello Bastea-Forte and Harry Saddler. To review the patent, click on this temporary link that's good for approximately 24-48 hours.
Today there's a special Apple Event discussing education and books. Imagine a day in the not-too-distant future when a student will simply ask Siri a question about what their lesson is about and get an answer or tutorial to make that lesson more real and relevant to the student. Wow, what a day that will be.
Notice: Patently Apple presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. Revelations found in patent applications shouldn't be interpreted as rumor or fast-tracked according to rumor timetables. Apple's patent applications have provided the Mac community with a clear heads-up on some of Apple's greatest product trends including the iPod, iPhone, iPad, iOS cameras, LED displays, iCloud services for iTunes and more. About Comments: Patently Apple reserves the right to post, dismiss or edit comments.
Here are a Few Great Sites covering our Original Report
MacSurfer, Twitter, Facebook, Real Clear Technology, Scoop Itally, Apple Investor News, Google Reader, Macnews, iPhone World Canada, MarketWatch, Techmeme, BGR, CNET, ZDNet Australia, iDevice Romania, MSNBC's Technolog, WebProNews, Electricpig UK, NBC Bay Area, Omicrono Spanish, Hexus UK, PadGadget, Spiegel Germany, Stuff TV UK, GigaOM, iDownloadBlog, Computerworld, Hardwareluxx Germany, App-News Germany, MacDailyNews, iPad-3 Netherlands, and more.
Note: The sites that we link to above offer you an avenue to make your comments about this report in other languages. These great community sites also provide our guests with varying takes on Apple's latest invention. Whether they're pro or con, you may find them to be interesting, fun or feisty. If you have the time, join in!
@Huhster: Nuance provides a speech-to-text component. How that component is put together is not being patented in this patent; I'm sure Nuance has patented that process/method. Instead, the use of such a component in conjunction with many other described (and named) components is being patented by Apple in this patent.
Posted by: kevin | January 20, 2012 at 10:20 PM
@ Huhster. That's a good question. Yet if you look at many products like an iPod or iPhone "iFixit" breakddown, you see that a product will use parts from different companies that I'm sure are licensed. So it's not the parts but the whole that matters. If you've created something unique collectively, and the brains behind it (software) is mainly yours, I could see how it could be patentable. Nuance didn't create an intelligent assistant like Siri. So there's no conflict here at all.
Posted by: Brian Fontane | January 20, 2012 at 05:35 AM
How does Apple patent something they license from Nuance ?
Posted by: Huhster | January 19, 2012 at 12:04 PM