Apple Reveals Powerful Pattern Detection Coming to iOS Cameras
Apple's iOS cameras will be eventually gaining very powerful pattern detection technologies. Although the iOS camera system will gain OCR and bar code scanning capabilities, Apple's future system will go far beyond that. The advanced camera system will be able to read IBSN numbers, pricing symbols, phone numbers and much more – all in context. For instance, you'll be able to scan a poster of a movie as noted in our cover graphic. From that you'll be able to scan faces and be given contextual menu options pertaining to that face or scan a phone number and be given the option to put it into your contact information or touch the photo of the movie image and call up a movie trailer or access to show times and other relevant information. You'll be able to scan a URL off of a magazine and then touch the URL on the image on your iPad and be given the option to activate the URL. Is that wild? This is really powerful technology that Apple will be building into next generation iOS devices that will greatly benefit consumers and professionals alike.
The Problem: Current Pattern Detection Technologies Can't Identify Patterns in Images
Current technologies for searching for and identifying interesting patterns in a piece of text data locate specific structures in the text. A device performing a pattern search refers to a library containing a collection of structures, each structure defining a pattern that is to be recognized. A pattern is a sequence of so-called definition items. Each definition item specifies an element of the text pattern that the structure recognizes.
A definition item may be a specific string or a structure defining another pattern using definition items in the form of strings or structures. For example, a structure may give the definition of what is to be identified as a US state code (postal code). According to the definition, a pattern in a text will be identified as a US state code if it corresponds to one of the strings that make up the associated definition items, such as "AL", "AK", "AS", etc
Another example structure may be a telephone number. A pattern will be identified as a telephone number if it includes a string of three numbers, followed by a hyphen or space, followed by a string of four numbers.
These pattern detection technologies only work to identify patterns in pieces of text data. In modern data processing systems, however, important data may be contained in other forms that just simple text.
One example of the form of data is an image, such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), TIFF (Tagged Image File Format), or other image file format. An image may be received at a data processing system, for example in an email or multimedia messaging service (MMS) message, or the image may be taken by a camera attached to the device.
The image may be of a document, sign, poster, etc. that contains interesting information. Current pattern detection technologies cannot identify patterns in the image that can be used by the data processing system to perform certain commands based on the context.
Apple's Solution
Apple's invention relates to identifying important information in an image that can be used by a data processing system such as an iPad, iPhone, iPod touch MacBook or other device, to perform certain commands based on the context of the information. For the sake of brevity and clarity, from this point forward the text of "data processing system" will be replaced by "iPad" so as to keep the language and imagery simple.
A text recognition module identifies textual information in the image. To identify the textual information, the text recognition module performs a text recognition process on image data corresponding to the image. The text recognition process may include optical character recognition (OCR).
A user interface provides a user with a contextual processing command option based on the data type of the pattern in the textual information. The data processing system executes the contextual processing command in an application of the system.
In certain embodiments, the application may include one of a phone application, an SMS (Short Message Service) and MMS (Multimedia Messaging Service) messaging application, a chat application, an email application, a web browser application, a camera application, an address book application, a calendar application, a mapping application, a word processing application, and a photo application.
In one embodiment, a facial recognition module scans the image and identifies a face in the image using facial recognition processing. The facial recognition processing extracts landmarks, such as the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw, from the face and compares the landmarks to a database of known faces. The user interface provides the user with a contextual processing command option based on the identification of the face in the image.
New Ability of Scanning Multiple Patterned Images in Context
Apple's patent FIG. 4A illustrated above shows us the user experience provided by an iPad with integrated image detection and contextual commands. In this example, patent FIG. 4A illustrates an image 400 of a promotional movie poster. The user can take a photo of the poster with their iPad. The poster contains a number of pieces of textual data which may be of interest to a user. Either automatically upon taking a photo of the poster or at the request of the user by an input command, the text recognition module may scan the poster for recognized characters and outputs a stream of character code.
Pattern search engine 232 searches the textual stream for known patterns and classifies them according to the provided structures and rules 234. In this poster example, the following patterns may be recognized:
Movie title 402; ISBN (International Standard Book Number) 404; website address 406; price value 408; album art 410; phone number 412; email address 414; street address 416; and barcode 418.
In Apple's patent FIG. 4B shown below we see a user who has highlighted area #450 on their iPad or iPhone which triggers contextual menu providing the user with options to select. In patent FIG. 4C we see that the list of contextual command options include open link in browser, Add to Bookmarks and Add to Address Book. Of course the contextual commands may be different depending on the data type you're scanning.
Different Scan Patterns will Present Different Contextual Menus to the User
A data detection module identifies a pattern in the textual information and determines a data type of the pattern. The data detection module may compare the textual information to a definition of a known pattern structure. Here are few examples:
Movie Related: Although not illustrated, the following commands may be relevant to the various identified patterns described above. For movie title 402, the commands may include offering more information on the movie such as show times, theater locations, a movie trailer, ratings and reviews. The information may be retrieved from a movie website over a network that may offer the user the ability to purchase tickets to an upcoming showing of the movie or purchase or rent it.
Book Related: For ISBN number 404, the commands may include offering more information on the book (e.g., title, author, publisher, reviews, excerpts, etc.) and offering to purchase the book from an online merchant, if available.
Pricing Related: For price value 408, the commands may include adding the price to an existing note (e.g., a shopping list) and comparing the prices to other prices for the same item at other retailers.
Time Related: For date and time 409, the commands may include adding an associated event to an entry in a calendar application or a task list, which may include adding it to an existing entry or creating a new calendar entry.
Music Related: For album art 410, the commands may include offering more information on the album (e.g., artist, release date, track list, reviews, etc.), offering to by the album from an online merchant, and offering to buy concert tickets for the artist.
Phone Related: For phone number 412, the commands may include calling the phone number, sending a SMS or MMS message to the phone number, and adding the phone number to an address book, which may include adding it to an existing contact entry or creating a new contact entry.
Email Related: For email address 414, the commands may include sending an email to the email address, and adding the email address to an address book, which may include adding it to an existing contact entry or creating a new contact entry.
Map Related: For street address 416, the commands may include showing the street address on a map, determining directions to/from the street address from/to a current location of the data processing system or other location, and adding the street address to an address book, which may include adding it to an existing contact entry or creating a new contact entry.
Bar Code Related: For barcode 418, the commands may include offering more information on the product corresponding to the barcode which may be retrieved from a website or other database, and offering to buy the product from an online merchant, if available. In response to the user selection of one of the provided contextual command options, the processing system may cause the action to be performed in an associated application.
Overview of the System: Facial Recognition, Image Detection & Contextual Commands
Apple's patent FIG. 5 is a block diagram illustrating an iPad or iPhone with integrated image detection, including facial recognition, and contextual commands.
In FIG. 5 below we see that a facial recognition module scans an image represented by image data 110 after text recognition module 120 has identified textual data and data detection module 130 has identified and recognizable patterns in the textual data.
Upon receiving the image data, facial recognition module 550 may perform facial recognition processing on the data to identify any faces in the image represented by the image data. In one embodiment, the facial recognition processing employs one or more facial recognition algorithms to identify faces by extracting landmarks, or features, from an image of the subject's face. For example, an algorithm may analyze the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw. These features are then used to search for other images with matching features.
The features may be compared with known images in a database 552 which may be stored locally in data processing system 500 or remotely accessible over a network. Other algorithms normalize a gallery of face images and then compress the face data, only saving the data in the image that is useful for face detection. A probe image is then compared with the face data.
Generally, facial recognition algorithms can be divided into two main approaches: geometric, which looks at distinguishing features; or photometric, which is a statistical approach that distills an image into values and compares the values with templates to eliminate variances. The facial recognition algorithms employed by facial recognition module may include Principal Component Analysis with eigenface, Linear Discriminate Analysis, Elastic Bunch Graph Matching fisherface, the Hidden Markov model, neuronal motivated dynamic link matching, or other algorithms.
As shown in FIG. 7 below, the user highlights the face on the poster and is given a contextual menu. In this example, the highlighted field is a detected face. For the detected face, the corresponding commands in context menu 740 may include confirming the recognized identity of the detected face, adding the face to a contact list or address book, which may include adding it to an existing contact entry or creating a new contact entry, performing any action in the contact entry associated with the detected face (e.g., calling a phone number in the contact entry, sending an email to an email address in the contact entry, pulling up a social networking website associated with the contact entry, etc.), or searching the internet for information on the detected face. In response to the user selection of one of the provided contextual command options, the iPad may cause the action to be performed in an associated application. Of course scanning a photo of a movie star, in this example, might not provide you with the ability to access their information (ha!). But you get the drift here.
Patent Credits
Apple's patent applicationwas originally filed in Q4 2010 by inventors Cedric Bray and Olivier Bonnet of Apple France.
It should also be noted that another patent specifically relating to bar codes was published today. If you're interested in bar code technology for iOS devices, then check out patent application 20120080515.
Notice: Patently Apple presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. Revelations found in patent applications shouldn't be interpreted as rumor or fast-tracked according to rumor timetables. Apple's patent applications have provided the Mac community with a clear heads-up on some of Apple's greatest product trends including the iPod, iPhone, iPad, iOS cameras, LED displays, iCloud services for iTunes and more. About Comments: Patently Apple reserves the right to post, dismiss or edit comments.
Check out Our Latest Report on Patent Bolt Titled:
Google+ Patent Reveals New Ghost Profiling Program
Here are a Few Sites covering our Original Report
MacSurfer, Twitter, Facebook, Real Clear Technology, Apple Investor News, Google Reader, Macnews, iPhone World Canada, MarketWatch, MacDailyNews, AppAdvice, iPhoneros Spain, and more.
Excellent!
Posted by: MonkeyMo | April 05, 2012 at 12:40 PM