Apple has won a patent for the creation of Deepfakes that alter the facial expression and pose of a person in a photo or video
According to Wikipedia, Deepfakes are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of creating fake content is not new, deepfakes leverage powerful techniques from machine learning and artificial intelligence to manipulate or generate visual and audio content that can more easily deceive. The main machine learning methods used to create deepfakes are based on deep learning and involve training generative neural network architectures, such as autoencoders, or generative adversarial networks (GANs).
Deepfakes have garnered widespread attention for their potential use in creating celebrity pornographic videos, fake news, hoaxes, bullying, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.
Last weekend I stumbled on the Amazon Prime Mystery Thriller series called "The Capture" (second season). A clip from season 2 is presented below.
To back-up this concept, below is a video that is perhaps more convincing of the creation of a deepfake that is closer to reality. It's a 2018 video. One could only image would could be done four years later.
So, what does this have to do with Apple? Well, yesterday the U.S. Patent Office granted Apple a patent for just that: Deepfake creations or Synthetic creations. The patent is titled "Face Image Generation with Pose and Expression Control." Of course, it's not as sophisticated as the TV series or the Obama presentation at present, but it definitely illustrates how Apple thinks this could be a future photos manipulation feature and/or app for still photos and videos. In fact, Apple already has the technology in place. More on that later.
Apple's newly granted patent notes that their invention covers systems and methods that relate to the creation of synthetic images of human faces based on a reference image. The synthetic images can incorporate changes in facial expression and pose.
At inference time, a single reference image can generate an image that looks like the person (i.e., the subject) of the reference image, but shows the face of the subject according to an expression and/or pose that the system or method has not previously seen. Thus, the generated image is a simulated image that appears to depict the subject of the reference image, but it is not actually a real image.
As used in the patent, a real image refers to a photographic image of a person that represents the person as they appeared at the time that the image was captured.
As explained in the patent, the systems and methods described first modify a shape description for the subject's face according to a change in facial expression and a change in pose. This results in a target shape description (e.g., parameters for a statistical model of face shape) that can be used to render an image of a target face shape.
The target face shape incorporates a changed expression and/or pose relative to the reference image. The target face shape is rendered to generate a rendered target face shape image.
The target face sufficiently depicts major facial features (e.g., eyes and mouth) to convey position, shape, and expression for these features.
The rendered target face shape image and the reference image are provided to an image generator as inputs. The rendered version of the target face shape serves as a map that indicates the locations of facial features, and the reference image is used as a texture source to apply the appearance of the subject from the reference image to the rendered version of the target shape.
The image generator is a trained machine learning model (e.g., neural network) that is configured to generate an image that looks like a realistic image of a human face, incorporates a face shape (e.g., including facial expression and pose) that is consistent with the face shape from the rendered version of the target face shape, and is consistent with the identity of the subject of the reference image (e.g., the person depicted in the generated image appears to be the same person as the subject of the reference image).
The image generator is trained to constrain generation of the output image based on the input image such that the output image appears to depict the subject of the input image.
The image generator may be part of a generative adversarial network that is trained by concurrently training the generator to generate images and concurrently training a discriminator to determine whether images are real or not real, correspond to the face shape from the rendered version of the target face shape, and correspond to the identity of the subject from the reference image.
Apple's patent FIG. 1 below is a block diagram that shows an image generation system that includes a shape estimator and an image generator; FIG. 2 is a block diagram that shows a shape estimator training system for the shape estimator.
Apple's patent FIG. 4 below is a block diagram that shows an image generator training system for the image generator; and FIG. 6 is a flowchart that shows a process for face image generation with pose and expression control.
The image generator training system #440 of FIG. 4 above is configured to train the image generator to output a generated image (#441) according to constraints that are learned by the image generator through a large number of iterations of a training procedure. The image generator training system is configured in the form of a generative adversarial network (GAN – as referred to in the Wiki definition of Deepfakes) in which a generator generates synthetic images, a discriminator attempts to determine whether the images are real or synthetic (fake), and the result of the determination is used to further train both the generator and the discriminator.
For more details, review Apple's granted patent US 11475608 B2.
Our cover image is from BioID. Below is a brief video from BioID that will show you how Apple could start the process using Face ID to create a mesh of your face that could then be manipulated. Does this particular process of creating a deepfake look familiar? Yes, of course. Today, Apple uses the iPhone's Face ID camera for creating Memoji which could easily lead to deepfake image manipulation. Though, make no mistake about it, this patent isn't about Memoji, as Apple acquired a company in 2015 called Faceshift where that technology came from. Patently Apple covered one of Faceshift's patents, now an Apple patent, in 2019 here.
Apple's latest granted patent is a completely different take on this technology, developed years after Memoji and delves into manipulating photos and videos. It's clear that Apple could take this technology much further if they so choose. It'll be interesting to see how Apple will make deepfake manipulation a friendly and non-threatening application.
Key Apple Inventors
Barry-John Theobald: Machine Learning Manager (7 years). Spent 11 years as Associate Professor and Senior Lecturer at the University of East Anglia (UK).
Nicholas Apostoloff: Machine Learning Researcher