While the idea of swapping faces, or creating synthetic media, with photo-editing software is not a new one, what makes the technique behind deepfakes so important is that it allows almost anyone to produce the same thing without learning any artistry, and on a much larger scale.
What Is a Deepfake?
Deepfake, a portmanteau of “deep learning” and “fake,” refers to media that has been digitally altered to replace a person’s face or body with that of another.
How Does a Deepfake Work?
Deepfakes use advanced deep learning techniques to first encode features, then reconstruct images from the encoded features. Autoencoders, a type of neural network, are the most commonly used deep learning architecture for creating deepfakes. Let’s explore this through an example of using the deepfake technique to swap faces of two people.
The first step to producing a deepfake is transforming the face images into smaller feature-based representations using an encoder. This more information-rich representation is often referred to as the latent face. The latent face will contain representations of features such as the nose shape, skin tone and eye color. We use the same encoder for each person so the representations produced have the same meaning.
We then transform the latent face back into an image using a decoder. Depending on which images we use to train the decoder, the output face image will vary. The key part for face swapping is that the decoder for person A is applied to the latent face of person B, and vice versa. In this way, the output face will have the expression and structure of person A, but the style and look of person B.
This figure shows the deepfake process through training (top two rows) and the creation of a deepfake (bottom row). In the training phase, the encoder (blue) is shared between different images to create the latent faces (middle portraits) from the input images (left portraits).
Individual decoders are trained for each person (green and pink). In the creation phase, the same encoder is used, but the decoder for a different person is used. This leads to an image that has the pose and expression of the input image but the style of the person used to create the decoder.
Are Deepfakes Only Videos?
While deepfakes are most often presented in videos, they can be found in photos and even audio.
The key applications of deepfake technology include:
- Face swap: Face swap is the most common and obvious technique, where the faces of two people are swapped. This is generally to produce an image of a celebrity in a scene they weren’t in.
- Face synthesis: Face synthesis stretches reality and generates a face for a person that never existed.
- Facial attributes and expression manipulation: In attribute and expression manipulation, a face is altered by either changing specific features such as the eyes, or by altering their expression, for example, turning a frown upside down.
How to Spot a Deepfake
With the wide availability of deepfake generation tools, it’s important that everyone has a basic understanding of how to spot a deepfake. There are deepfake detection tools available to the public, but these don’t always identify deepfakes. As a result, it’s important to be aware of the ways to detect a deepfake:
- Unnatural face, environment or lighting: Deepfake images or sections of videos can have unnatural facial expressions, facial feature placement or jagged edges. The environment itself (such as the lighting) can also be unrealistic.
- Unnatural behavior: In deepfake videos there must be continuity between images, but this is difficult to implement. As a result, you might spot unnatural behaviors such as uneven blinking or choppy motion.
- Image artifacts and blurriness: Deepfake images may have weird artifacts such as blurriness around the neck where the body of one person is stitched together with the face of another.
- Audio: When deepfakes are combined with audio, the lips may follow an unexpected motion compared to what you would expect from the audio.
How Do You Make a Deepfake?
Creating deepfakes is surprisingly easy. A wide array of smartphone apps makes deepfake production readily accessible to a majority of the population. In addition to apps, computer programs allow a person to run far more advanced reproductions on local CPUs (computer processing units), or the best reproductions on GPUs (graphics processing units).
Some of the most popular tools are:
- FaceApp: FaceApp transforms photos to add and remove features, thereby allowing users to craft celebrity-like photos in just a few clicks.
- Wombo: Wombo creates singing and dancing video clips from users’ uploaded photos.
- Deepfakes Web: Deepfakes Web is a cloud-based application that can realistically swap out faces of people in videos.
- Face Swap Live: Face Swap Live lets users swap faces with someone during live videos and with figures in images.
What Are the Risks of Deepfakes?
As with many technological innovations, deepfakes have had both positive and negative effects.
Pornographic Deepfakes
Research has estimated that 98 percent of deepfake videos are created for pornography, in which creators produce videos that place celebrity faces onto models. This application has led some countries to ban pornographic deepfakes while others grapple with deepfake pornography epidemics; various websites have stated deepfake pornography is against their terms and conditions.
In 2022, Google banned the use of its Colab service for the creation of deepfakes, a previously popular way to train deepfake models. And as of 2024, YouTube requires creators to disclose to viewers when realistic content is made with synthetic media or generative AI, including content “using the likeness of a realistic person.” Meta has also begun labeling AI-generated images on Facebook, Instagram and Threads as determined by industry-standard indicators. These measures from Google and Meta aim to further reduce the prevalence of deepfakes.
Political Deepfakes
The application of deepfakes in the arena of politics is likely the most controversial. During the 2022 Russian invasion of Ukraine, deepfakes of Vladimir Putin and Volodymyr Zelenskyy depicted each surrendering to the other side. And as part of his 2024 presidential campaign, Donald Trump posted deepfake images that made it appear as if Taylor Swift and her fans were endorsing him. The influence of deepfakes has reached the point where both Microsoft and United States lawmakers are now calling for stronger legal safeguards.
The high quality of deepfakes makes it difficult for the general public to discern fact from fiction. Even if a deepfake is later revealed to be fake, the video can already have done significant damage.
Frequently Asked Questions
What is a deepfake?
A deepfake refers to media that is digitally altered to replace a person's face or body with that of another. Deepfakes are created using deep learning technology, hence the term being a portmanteau of "deep learning" and "fake."
Are deepfakes illegal?
Deepfakes can be illegal when created for the purposes of nonconsensual and explicit content, defamation, misinformation or intentional emotional distress.
Can deepfake be detected?
Deepfakes can be detected by identifying unnatural facial, environmental or behavioral features, such as blurriness, uneven blinking and body movements or lip movements that don't align with video audio.
Deepfake detection software can also be used to detect deepfake media.
How can you tell if a video is AI-generated?
An AI-generated video may be identified by the following characteristics:
- Unusual blurring, shadows or light flickers on elements throughout the video
- Unnatural body language or composition
- Nonrealistic, glossy or plastic-like textures and surfaces
- Overly saturated or vivid colors
- Inconsistent or choppy movements
- Objects, bodies or body parts suddenly appearing, disappearing or clipping through one another
What is the most common deepfake?
The most common type of deepfake is face swapping, where the face of one person is superimposed onto the body of another person in an image or video. This is often used to make a celebrity appear as if they were in a scene when they never were.