Apple Granted a Surprising Patent for an Advanced Multi-View Video Conferencing System for the Enterprise
The US Patent and Trademark Office officially published a series of 33 newly granted patents for Apple Inc. today. In this particular report we cover an invention that relates to an advanced video conferencing system that includes a multi-view camera system that employs scalable video encoding. On July 15 we posted a report titled "A New Landmark Partnership between Apple and IBM is about to transform the Way Work gets done." Apple's surprising video conferencing system is to compete with Microsoft's Round Table Conferencing device and it's a system that Apple might be able to get IBM to promote in the enterprise.
Apple's Patent Background
Video conferencing is gaining traction due to the development of new applications and equipment that make establishing a video conference easy and convenient. However, the quality of the video content in the video conferences is generally low. Part of the quality issue relates to the large amount of bandwidth required to send high quality video between conference locations.
Another part of the quality issue relates to the awkward positioning of the video cameras that are used to capture video of the conference locations. Some configurations employ one or two cameras that provide views of most if not all of the conference location. As a result, the resolution of the video is relatively low with respect to any given participant. To increase the resolution of the video for participants, the cameras are moved closer to the active participant and then moved as different participants talk.
Various efforts have been made to address these issues. One noteworthy effort is by Microsoft and its Round Table conferencing device. The Round Table conferencing device sits in the middle of a conference table and provides a 360 degree view of the conference location and tracks the flow of conversation among the active speakers, such that the audio content of the active speaker is provided to other conference locations along with video content of the 360 degree view of the conference location. As such, close up, high quality video content of the conference participants is available. Unfortunately, the transport of high quality video content from one location to another is very bandwidth intensive.
Video conferencing systems, such as the Round Table conference device, generally employ extensive compression, or encoding, techniques to reduce the bandwidth required to transport the video content from one location to another. The extensive encoding generally results in a substantial decrease in quality of the video content in general. Since the video content generally includes images of each of the participants, the quality of the portions of the video allocated to each of the participants, including the active speaker, is also decreased.
When the video content is being viewed at another location by a remote participant, the focus of the remote participant is generally on the active speaker and not on the other non-active participants that are included in the video content.
There is a need to provide higher quality video content for the active speaker and little need to provide higher quality video for the other non-active participants. Accordingly, there is a need for a video conferencing technique that is capable of providing higher quality video content for the active speaker while providing lower quality video content for the other non-active participants in a given conference location in a bandwidth efficient and effective manner.
Apple's Granted Patent
Apple's granted patent covers an invention that employs scalable video coding (SVC) in a multi-view camera system, which is particularly suited for video conferencing. Multiple cameras are oriented to capture video content of different image areas and generate corresponding original video streams that provide video content of the image areas.
An active one of the image areas may be identified at any time by analyzing the audio content originating from the different image areas and selecting the image area that is associated with the most dominant speech activity. In a first embodiment, the video content from each of the original video streams is used to generate composite video content, which is carried in a composite video content stream. The composite video content may include multiple image windows, wherein each image window includes the video content of a corresponding image area. The composite video content stream is encoded using SVC to provide an encoded video stream having at least a lower SVC layer and a higher SVC layer. The lower SVC layer includes base information from which the composite video content can be reconstructed at a lower quality level. The higher SVC layer includes enhancement information for a selected portion of the composite video content. The selected portion of the composite video content corresponds to the image window in which video content of the active image area is provided. The enhancement information provides supplemental coding information that, when used with corresponding base information, allows the selected portion of the composite video content to be reconstructed at a higher quality level when the encoded video stream is decoded.
The encoded video stream along with an audio stream for the selected audio content is encapsulated into an appropriate transport stream, such as a Real-Time Transport Protocol (RTP) stream, and delivered to a conference bridge or another conference location. The selected audio content may primarily correspond to that originating from the active image area or a mix of some or all of the different image areas. When the lower SVC layer and the higher SVC layer are used for decoding the encoded video stream at the conference bridge or other conference location, the selected portion of the composite video content is reconstructed at a higher quality level while the rest of the composite video content is reconstructed at the lower quality level. If the higher SVC layer is not available, the entirety of the composite video content may be reconstructed at the lower quality level. Once the composite video content is reconstructed, it may be presented to other conference participants in association with the selected audio content.
Apple's granted patent FIG. 1 employs scalable video coding (SVC) in a multi-view camera system which is particularly suited for video conferencing. Multiple cameras are oriented to capture video content of different image areas and generate corresponding original video streams that provide video content of the image areas. Apple's granted patent FIG. 4 is a block representation of a multi-view camera system.
Apple credits Dany Sylvain as the sole inventor of granted patent 8,791,978 which was originally filed in Q2 2012 and published today by the US Patent and Trademark Office. To review today's granted patent claims and details, see Apple's patent.
A Note for Tech Sites covering our Report: We ask tech sites covering our report to kindly limit the use of our graphics to one image. We thank you in advance for your cooperation.
Patently Apple presents only a brief summary of granted patents with associated graphics for journalistic news purposes as each Granted Patent is revealed by the U.S. Patent & Trademark Office. Readers are cautioned that the full text of any Granted Patent should be read in its entirety for full details. About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Comments are reviewed daily from 4am to 8pm MST and sporadically on the weekend.
Comments