Apple's invents a Coding System for VR Applications to better Handle Omnidirectional and Multi-Directional Images
Today the US Patent & Trademark Office published a patent application from Apple that relates to coding techniques for omnidirectional and multi-directional images and videos. The first patent covering this topic was posted last October titled "Apple Invents a 360° Head-Mounted Display System that Relates to Post-Production of VR Applications." As the title reveals, the patent covered post-production of VR Applications whereas the patent application coming to light today covers the coding side of things.
For instance, Apple notes that today, coding applications do not account for image distortions that can arise when processing these omnidirectional or multi-directional images with the distortions contained within them. These distortions can cause ordinary video coders to fail to recognize redundancies in image content, which leads to inefficient coding. Apple's invention is to overcome these coding inefficiencies.
To better understand the context of Apple's project, we can see from the 2017 patent that the focus is on a headset and its imagery created from post production of VR applications. In last year's patent filing, Apple noted that conventional 180 degree or 360 degree video and/or images are stored in flat storage formats using equirectangular or cubic projections to represent spherical space. If these videos and/or images are edited in conventional editing or graphics applications, it is difficult for the user to interpret the experience of the final result when the video or images are distributed and presented in a dome projection, cubic, or mapped spherically within a virtual reality head-mounted display. Editing and manipulating such images in these flat projections requires special skill and much trial and error.
Further, it is not an uncommon experience to realize after manipulation of images or videos composited or edited with spherical that subsequent shots are misaligned, or stereoscopic parallax points do not match in a natural way.
Apple's 2017 invention was generally directed to methods and systems for transmitting monoscopic or stereoscopic 180 degree or 360 degree still or video images from a host editing or visual effects software program as equirectangular projection, or other spherical projection, to the input of a simultaneously running software program on the same device that can continuously acquire the orientation and position data from a wired or wirelessly connected head-mounted display's orientation sensors, and simultaneously render a representative monoscopic or stereoscopic view of that orientation to the head mounted display, in real time.
Apple's patent FIG. 5 shown above from the 2017 patent illustrated an example for presenting a preview image by a 3D display device. FIG. 5 illustrated the user interface of a video or image editing or graphics manipulation software program #501 with an equirectangularly projected spherical image displayed in the canvas #502 and a compositing or editing timeline #503.
Processing of Equirectangular Object Data to Compensate for Distortion by Spherical Projections
In Apple's current patent background they note that some modern imaging applications capture image data from multiple directions about a camera. Some cameras pivot during image capture, which allows a camera to capture image data across an angular sweep that expands the camera's effective field of view. Some other cameras have multiple imaging systems that capture image data in several different fields of view. In either case, an aggregate image may be created that represents a merger or "stitching" of image data captured from these multiple views.
Many modern coding applications are not designed to process such omnidirectional or multi-directional image content. Such coding applications are designed based on an assumption that image data within an image is "flat" or captured from a single field of view.
Thus, the coding applications do not account for image distortions that can arise when processing these omnidirectional or multi-directional images with the distortions contained within them. These distortions can cause ordinary video coders to fail to recognize redundancies in image content, which leads to inefficient coding.
Accordingly, the inventors of Apple's 2018 patent perceive a need in the art for coding techniques that can process omnidirectional and multi-directional image content and limit distortion.
Apple's patent FIG. 1 below illustrates a system #100 that may include at least two terminals #110 and #120 interconnected via a network #130. The first terminal #110 may have an image source that generates multi-directional and omnidirectional video. The terminal also may include coding systems and transmission systems (not shown) to transmit coded representations of the multi-directional video to the second terminal #120, where it may be consumed.
For example, the second terminal may display the spherical video on a local display, it may execute a video editing program to modify the spherical video, or may integrate the spherical video into an application (for example, a virtual reality program), may present in head mounted display (for example, virtual reality applications) or it may store the spherical video for later use.
Embodiments of the present disclosure find application with laptop computers, tablet computers, smart phones, servers, media players, virtual reality head mounted displays, augmented reality display, hologram displays, and/or dedicated video conferencing equipment.
Apple's patent FIG. 2 above is a functional block diagram of a coding system. The system #200 may include an image source, an image processing system, a video coder, a video decoder, a reference picture store, a predictor and a pair of spherical transform units (#270 and #280).
The image source may generate image data as a multi-directional image, containing image data of a field of view that extends around a reference point in multiple directions. The image processing system may convert the image data from the image source as needed to fit requirements of the video coder #230.
The video coder may generate a coded representation of its input image data, typically by exploiting spatial and/or temporal redundancies in the image data. The video coder may output a coded representation of the input data that consumes less bandwidth than the input data when transmitted and/or stored.
For those digging deeper into this patent later, you'll find the "coding system" described in more detail from patent point #0022 through to patent point #0026.
Apple's patent FIG. 3 above illustrates 3 image sources that find use with embodiments of the present disclosure. Apple's patent FIG. 3 illustrates image sources #310 and #340 that find use with embodiments of the present disclosure.
A first image source may be a camera #310, shown in FIG. 3a, that has a single image sensor (not shown) that pivots along an axis. During operation, the camera may capture image content as it pivots along a predetermined angular distance (preferably, a full 360 degrees) and merge the captured image content into a 360.degree image.
The capture operation may yield an equirectangular image #320 having predetermined dimension M.times.N pixels. Optionally, the equirectangular image may be transformed to a spherical projection.
Apple's patent FIG. 3b above (the middle row) illustrates image capture operations of another type of image source, an omnidirectional camera #340. In this embodiment, a camera system may perform a multi-directional capture operation and output a cube map picture #360 having dimensions M.times.N pixels in which image content is arranged according to a cube map capture #350.
Apple's patent FIG. 3(c) illustrates image capture operations of another type of image source, a camera #370 having a pair of fish-eye lenses. In this embodiment, each lens system captures data in a different 180.degree field of view, representing opposed "half shells."
The camera may generate an image #380 from a stitching of images generated from each lens system. Fish eye lenses typically induce distortion based on object location within each half shell field of view.
Apple's patent application 20180234700 was originally filed back in Q1 2017. If you're a coder, you could dig into the finer details here. Considering that this is a patent application, the timing of such a product to market is unknown at this time.
Patently Apple presents a detailed summary of patent applications and/or granted patents with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.