Apple Wins Key Patent for Light Field Cameras that could deliver an Immersive AR Experience for Macs, iDevices & Headset
The U.S. Patent and Trademark Office officially published a series of 68 newly granted patents for Apple Inc. today. In this particular report we cover an important patent related to future Immersive Augmented Reality that could be used on future iMacs, as shown in our cover graphic in relation to advanced FaceTime sessions, iDevices including a future headset. Tim Cook spoke to ABC News about users being able to talk with each other online and being very present while other things could be brought into the picture. This supports the vision of immersive augmented reality.
Granted Patent: Light Field Capture
Apple's newly granted patent covers their invention relating to operations, systems, and computer readable media to capture images of a scene using a camera array and process the captured images based on a viewer's point of view (POV) for immersive augmented reality, live display wall, head mounted display, video conferencing, and similar applications
The use of immersive augmented reality, display wall, head mounted display, and video conference has increased in recent years. For example, a video conference is an online meeting that takes place between two or more parties, where each party can hear the voice and see the images of the other. In a video conference between two parties, each party participates through a terminal, e.g., a desktop computer system, a tablet computer system, TV screen, display wall, or a smart phone, at each site. A terminal typically comprises a microphone to capture audio, a webcam to capture images, a set of hardware and/or software to process captured audio and video signals, a network connection to transmit data between the parties, a speaker to play the voice, and a display to display the images. In such a traditional setup, a viewer could only see a fixed perspective of his counterparty and her scene. In particular, the viewer could only see what is captured by the counterparty's webcam. Further, as the viewer moves from one location to another during the conference, his point of view (POV) may change. However, due to limitations of the image capturing at the counterparty's site, the viewer could only see images from the same perspective all the time.
Light Field Capture (Camera)
Apple's invention describes a technology that relates to, and may be used in, image capture and processing for immersive augmented reality, live display wall, head mounted display, and video conferencing applications. In one embodiment, the disclosed subject matter provides a complete view to a viewer by combining images captured by a camera array. In another embodiment, the disclosed subject matter tracks the viewer's point of view (POV) as he moves from one location to another and displays images in accordance with his varying POV. The change of the viewer's POV is inclusive of movements in, for example, the X, Y, and Z dimensions.
In accordance with one embodiment, for example, during a video conference, each party participates through a terminal. Each terminal comprises a display, a camera array, an image processing unit (e.g., including hardware and/or software), and a network connection (e.g., through cable and/or wireless connections). Each camera array may comprise a plurality of cameras.
The camera array may capture images (e.g., color RGB, YUV, YCC, etc.). Also the camera array may either capture depth, capture information to compute depth (e.g., structured light, time of flight, stereo images, etc.), or compute depth from other means of each party, and meanwhile track this party's POV (e.g., represented by this party's head and/or eye positions). Data representative of a viewer's POV may be transmitted by the viewer's terminal and received by a speaker's terminal through the network connection."
Note that Apple's usage of the term "speaker in the context above and below is in regard to a person, as noted in patent FIG. 6 below, and not an audio speaker.
Apple further notes that "The image processing unit within the speaker's terminal may process the captured images of the speaker based on the viewer's POV. In particular, image processing operations may comprise a culling operation that trims pixels from the captured images based on the viewer's POV and identifies remaining pixels. The purpose of the culling operation is to reduce the amount of data for processing. Because the processed data will ultimately be transferred from one party to the other, culling reduces the amount of data for transferring, saves bandwidth, and reduces latency.
After culling, the image processing unit may map the remaining pixels from individual cameras' three-dimensional (3-D) space to two-dimensional (2-D) display space. Next, data of the mapped pixels may be transmitted by the speaker's terminal and received by the viewer's terminal through the network connection. Subsequently, the image processing unit within the viewer's terminal may blend the mapped pixels and assemble an image ready for display (i.e., a "frame").
Separately, the speaker's POV may be used by the viewer's terminal to process captured images of the viewer. The image processing operations within the viewer's terminal may be a "mirror" process to those described within the speaker's terminal.
As one with ordinary skill in the art should appreciate, the terms "speaker" and "viewer" are used here to facilitate an explanation of the disclosed concepts. In a video conference, each party behaves as both speaker and viewer with respect to his/her counterparty. Thus, the image capture and processing operations, described above at the viewer's and the speaker's sites, take place simultaneously and continuously within the terminal at each site. This provides each party a continuous display of frames (i.e., a live video) of his/her counterparty based on this party's POV.
Further, the camera array may either stand-alone by itself or be integrated into the display at each site. For applications such as immersive augmented reality, live display wall, and head mounted display where there may be only one viewer all the time, the terminals may be asymmetric with a camera array only at the capture site to capture a scene to be viewed on the viewer's display, while there may be no camera array at the viewer's site.
The viewer's POV may be tracked by one or more cameras or other devices, separate from a camera array, dedicated for tracking purposes, and the speaker's POV may not be tracked.
In general light field cameras are revolutionizing the way we view images in video and stills. Today's iPhone 7-Plus offers users the ability to choose Portrait mode when taking a photo. This feature automatically creates a photo with a Bokeh effect. This is an effect where the person or object is in the forefront of the picture is crystal clear while the background imagery is gently blurred.
With Light Field technology added to an iPhone camera users will be able to go one step further. A user looking at a photo of an image they've captured in Portrait mode could be altered so that the user can refocus the shot on another object after the photo is taken. It's a fantastic feature that you could see an example of it here (link is now broken as Google acquired Lytro, the company behind the focus anywhere technology in March 2018). You'll see a photo of a pole with posters on it with the background blurred. If you click on the bridge in the background of the photo, the focus will automatically shift to the bridge and blur the pole.
In another example (link broken as Google acquired Lytro March 2018) you'll see the focus of the photo is on one of the penguins while the other is blurred by the bokeh effect. Simply click on the other penguin in the photo and the photo resets the focal point accordingly. Try it out yourself. Very cool.
One of my favorite sites that provides a nice overview of where light field photography is going (still and video) is Lytro (March 2018: broken link due to Google acquiring the company).
On that note, light field cameras coming to future iMacs for super advanced FaceTime conferencing and other applications for iDevices and a stand-alone headset (or accessory) is very exciting.
Apple's granted patent 9,681,096 was originally filed in Q3 2016 and published today by the US Patent and Trademark Office. Some of the inventors include Gary Vondran who is an Imaging Scientist, Camera Systems for Apple while Monahar and Miller are listed as Camera Prototyping Engineers.
Patently Apple presents only a brief summary of granted patents with associated graphics for journalistic news purposes as each Granted Patent is revealed by the U.S. Patent & Trademark Office. Readers are cautioned that the full text of any Granted Patent should be read in its entirety for full details. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.