Apple Reveals iPhone Subtitle & Audio Fingerprinting Features
A new Apple patent published by the United States Patent and Trademark Office today reveals that Apple will be adding easy to use subtitle and closed captioning features to their iPhone and other media players such as Apple TV. The system will also provide users with the ability to control the look of on-screen subtitles by choosing font styles, colors, sizes and even the style of box the text will be presented in. Apple's last feature takes advantage of ambient noise technology so that closed captioning could be triggered automatically when a mobile user is watching a movie or TV show in a noisy environment such a subway, bus, park or gym.
Patent Background
A video could include subtitles or closed captions. The subtitles or closed captions could provide a translation or a transcript of the spoken dialogue in a video and optionally other information. Closed captions are useful to hearing impaired viewers. Subtitles are useful for viewing foreign language videos or for viewing videos in a noisy environment. Subtitles and closed captions are typically invoked on a mobile device by selecting an option from a menu screen. On some devices, navigating menus and selecting audio options can be a cumbersome process that requires the user to perform multiple actions or steps.
You only have to look at this long-winded solution to realize that it's not exactly an easy process to use subtitles on an iPhone (or other Apple media device) – and this is what Apple's patent provides a solution for including a rather interesting twist.
Patent Summary
Ambient noise sampled by a mobile device from a local environment is used to automatically trigger actions associated with content currently playing on the mobile device. In some implementations, subtitles or closed captions associated with the currently playing content are automatically invoked and displayed on a user interface based on a level of ambient noise.
In some implementations, audio associated with the currently playing content is adjusted or muted. Actions could be automatically triggered based on a comparison of the sampled ambient noise, or an audio fingerprint of the sampled ambient noise, with reference data, such as a reference volume level or a reference audio fingerprint. In some implementations, a reference volume level could be learned on the mobile device based on ambient noise samples.
QuickTime's Closed Captioning Option
Apple's patent FIG. 2 shown below illustrates an example of content playing in full screen mode including an overlying and partially transparent navigation panel 202 or "heads up" display. The navigation panel could contain one or more navigation elements which could be used to invoke navigation operations on the currently playing content (e.g., video, audio, slideshow, keynote presentation, television broadcast, webcast, videocast).
The user could turn closed captioning on or off by touching a closed captioning element 210. The user could specify a language preference by touching a language menu element 212 to invoke a language option sheet 300, as described in reference to FIG. 3 below.
My iPod touch has the element 210 but it's for language, not closed captioning, and language is limited at that. So the element noted as 210 appears to have or is about to change to being a closed captioning element and element 212 appears to be the new element for language. Whether we'll see this in iOS 4 in a few weeks or so is yet to be determined.
The patent also notes that in some implementations, the video content could be a television broadcast, videocast, webcast, Internet broadcast, etc. In some implementations, the language option sheet described in reference to FIG. 3 below could be generated by a service (e.g., by a cable headend) or a set-top box which is likely referencing Apple TV.
Example Language and Subtitle Option Sheet
Apple's patent FIG. 3 illustrates an example of a video played in full screen mode, including an overlying and partially transparent option sheet 300. The option sheet includes a display element 302 showing language options for audio associated with the currently playing video. In the example shown, the language options include English, English (Director's Commentary), and Spanish. Other languages could also be included as options (e.g., French, German).
The option sheet also includes a display element 304 showing options for subtitles associated with the currently playing video. Options for subtitles could include options for color, fonts and styles for the subtitles in addition to language. For example, the user could select an option to show the subtitles in a frame surround the video (e.g., letterbox mode) or overlying the video (e.g., full screen mode). Those options will reside below the Subtitle box and the user will have to scroll screen 300 to access the options controlling fonts etc.
Example Process for Ambient Noise Based Augmentation of Content
Apple's patent FIG. 4 shown is a flow diagram of an example process 400 for ambient noise based augmentation of content.
Ambient noise present in the local environment of the mobile device is sampled by the mobile device (404). In some implementations, the ambient noise is received through a microphone of the iPhone. One or more actions are performed on the iPhone on the sampled ambient noise (406). At least one action could be performed on, or associated with, the content currently playing on the iPhone.
Audio fingerprints of ambient noise for various environments could be stored in the iPhone. Different actions could be taken for different environments. Thus, the iPhone could identify its local environment by sampling ambient noise present in the local environment, computing an audio fingerprint from the sampled audio noise, comparing the audio fingerprint with reference audio fingerprints stored in a database to find a match, and thus identify a type of ambient noise or environment.
A table of actions could be associated with the reference audio fingerprints. The table of actions could be accessed by a processor on the mobile device which then carries out the actions. For example, there could be a different volume adjustment factor associated with each reference audio fingerprint. One environment may be noisier than another environment. These differences in ambient noise would be captured by two different audio fingerprints.
In a first environment (e.g., a gym), the ambient noise could be very loud and would require a large increase in volume or an invocation of subtitles. In a second environment (e.g., a shopping mall), the ambient noise could be less than the first environment and would require a lesser increase in volume and possible not an invocation of subtitles.
In some implementations, one or more actions could include the automatic invocation and/or display of subtitles and/or closed captions with the currently playing content. These automatic actions could be accompanied by other automatic actions, such as an automatic mute and/or automatic adjustment of volume (up or down) of the currently playing content. In one example, an action can be pausing the currently playing content rather than muting or adjusting the audio volume.
For mobile devices that include one or more sensors or controls (e.g., GPS, ambient light sensors, accelerometers), one or more actions could be triggered based on the ambient noise and/or input from at least one control or sensor. For example, the GPS system on your iPhone could provide position coordinates to your iPhone. One or more actions triggered by ambient noise received by a microphone could be recorded in a database, together with a descriptor for the action. So when a user is in a gym, and subtitles are invoked due to the loud ambient noise in the gym, the location of the mobile device is recorded. Each time the user returns to the recorded location, the gym in this example, the action is automatically performed.
In some implementations, only a first action in a sequence of actions is invoked based on ambient noise. For example, ambient noise could trigger a mute function, and the triggering of the mute function could trigger invocation of subtitles or closed captioning.
Apple credits Joel Kraut as the sole inventor of patent application 20100146445, originally filed in Q4 2008.
Other Patent Applications Published Today
Patent 20100139990 - Selective Input Signal Rejection and Modification: This is an interesting patent that generally relates to various types of complex multi-touch pad environments such as those used in Apple's MacBook, Magic Mouse or even the iPod's clickwheel where you have both touch and pressure or "Pick Sensing" technology working in unison.
Patent 20100146318 - Battery Gas Gauge Reset Mechanism; Patent 20100142730 - Crossfading of Audio Signals; Patent 20100142134 - Cold Worked Metal Housing for a Portable Electronic Device; Patent 20100142164 - Electrical Components Coupled to Circuit Boards; Patent 20100140068 - Stiffening Plate for Circuit Board and Switch Assembly - and Patent 20100139085 - Techniques for Reducing Wasted Material on a Printed Circuit Board Panel.
Continuation Patents or Patents Previously Published: Patent 20100145949 - Methods and Systems for Managing Data; Patent 20100145908 - Synchronization Methods and Systems - and Patent 20100145691 - Global Boundary-Centric Feature Extraction and Associated Discontinuity Metrics.
For additional information on any patent presented here today, simply feed the individual patent number noted in this report into this search engine.
Notice: Patently Apple presents only a brief summary of patents with associated graphic(s) for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for further details. About Comments: Patently Apple reserves the right to post, dismiss or edit comments.
Comments