In less than 3 weeks of its release, a new AI-powered service called Deep Nostalgia has become an internet sensation. Some call it creepy; for others, it’s magical. Licensed from D-ID (an Israeli company that specializes in video reenactment) by a genealogy site MyHeritage, this deep learning-based technology has enabled the users to animate the faces of their loved ones in still photos.
How Does Deep Nostalgia Work?
The process of creating an animation on the site is simple:
- Create an account on MyHeritage (app or site).
- Upload a photo on the “Animate Photo” service.
- Select the face that you wish to animate. Voila!
To date, the app has been used for animating over 33 million photos (according to the Deep Nostalgia website) and has been ranked the number 1 free app on the iOS App Store in many countries. Let’s scratch the surface a little and understand what fuels this technology.
Here is quick breakdown of how these images come to life:
“D-ID’s groundbreaking Reenactment technology brings still photos to life. Our process connects your picture to a “driver” video, animating and voicing the subject of the picture so that they speak, smile, blink, move their head, and match any movement in the “driver” video. Using AI-technology, D-ID has added a new dimension to profile pictures, historical photos, or family portraits, transforming images to video in ways that they never imagined.”—D-ID
A few points to note:
- The gestures in driver videos are real (mostly by employees of MyHeritage.com)
- The driver videos do not contain speech in order to prevent misuse, as is the case in deepfake videos. (More on deepfakes later in this post.)
- Users can also select a particular animation for the image (from a set of available animations).
- Sometimes, the animations also need to generate features like teeth, ears, etc. which were hidden or partially visible in the original image. This renders the animation a bit uncanny sometimes.
The transfer of the driver video onto the still image can also be seen and understood in the video below (taken from D:ID’s site):
Is Deep Nostalgia Deepfake Technology?
The technology behind these Harry Potteresque animations is not something new or complex.
If you haven’t noticed it till now, it isn’t too difficult to see how this technology resembles something that has been on your phone or you have seen on the internet before.
Do you recall FaceApp, Instagram’s face filters, Snapchat Face Swaps or the Prisma App? All of these use similar deep learning technology to bring out amazing transformations/manipulations of images. Basically, the goal is to take a target image and synthesize an image or video which was not present before. It is not too hard to reason that Deep Nostalgia is a highly advanced deepfake technology.
The striking difference between Deep Nostalgia and other deepfake technology is that it usually requires more than one target image to produce a deepfake video (+ audio) while in the case of Deep Nostalgia the output is achieved with just one target image i.e. the algorithm is a few shots leaner.
In fact, in 2019 researchers at the Samsung AI center published a paper titled “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models” on how deepfakes could be generated using fewer images. You can have a look at the results and overview of the paper in the video shared below:
Here is one more example of deepfake that went viral on the internet last year, featuring Tom Cruise. Well, not exactly Tom Cruise but a TikToker deepfaking to be Tom Cruise playing golf and even performing magic tricks.
The Cruise fakes are almost impossible to be called fakes were it not for the voice.
GANs and Their Contribution to Deepfake Technology
Generating deepfakes usually involves leveraging deep generative models called Generative Adversarial Networks (GANs). Before we try and understand GANs, we should first understand what generative models are. Most of us are aware of the broad categorization of machine learning models into supervised & unsupervised models. However, the categorization could also be done as discriminative & generative models.
For example, a discriminative model would use a discriminant function f(x) to map each data point-x, onto a particular predicted class in a classification problem solved using a supervised machine learning algorithm. On the other hand, unsupervised models that summarize the distribution of the input to predict the output like an auto-complete/next word prediction in a sentence are called generative models.
Two examples of generative models that are widely used are Naive Bayes & Gaussian Mixture Models (GMM). Two popular Deep learning models that can be used as generative models are the Restricted Boltzmann Machine(RBM), and the Deep Belief Network (DBN). The examples of generative models more relevant to deepfakes are the Variational Autoencoder(VAE), and the Generative Adversarial Network(GAN).
Generative Adversarial Networks or GANs as they more popularly abbreviated were first introduced in a paper titled “Generative Adversarial Nets” by Ian J Goodfellow, et al in 2014. . In simpler words, the GAN architecture/framework helps in estimating generative models by leveraging an adversarial process in which two models i.e. a generative model G and a discriminative model D are trained simultaneously.
A survey paper titled “Deep Learning for Deepfakes Creation and Detection: A Survey” gives excellent insights into how GANs can be used not just for creating but also for detecting fakes. The following figure from the paper illustrates the concept of a deepfake creation model using two encoder-decoder pairs.
Other more common applications of GANs include:
- Image Synthesis
- Creating Art
- Image translation
Visit our blog post on deepfakes for more details.
Is the Tech Behind Deep Nostalgia an Ethical Concern?
Given the potential and actual abuse that has happened with deepfake tech, there is indeed an ethical concern behind Deep Nostalgia. Deepfake tech has been a point of contention in the industry for many years now due to its use in altered pornography, fake videos, and voices that have resulted in endangering financial and national institutions in many countries. Defamation is another big risk associated with this tech.
In fact, MyHeritage senses these issues and asks users to be cautious while uploading images of people who are alive (with permission). Also, the absence of sound in the animation is a way of pre-empting any abuse that might happen with the service. Having said that, it is indeed possible with the existing tech to produce a video-like animation which is in fact show-cased in this advertisement where a “driver” creates animation with Abraham Lincoln in it (in color!)
Deepfakes indeed have the potential to impact public trust and security. Their existence also leads to plausible deniability where an accused might claim video footage or evidence to be fake. With the technology becoming better and producing results that are near-real and hard to be detected fake, it is bound to create a fear that the same could also be used to manipulate, hurt, and impact people in general and not just celebrities and popular personalities.
Even though the danger posed by such technology is quite real, it is hard to resist when it presents itself as a moving, blinking, and smiling face of your long-passed loved ones.
Prepare for the Future of Machine Learning With Springboard
The Springboard Machine Learning Engineering Career Track will give you 14 real-world projects that you can add to your machine learning portfolio and use to impress a potential employer. You will also get 1:1 mentorship from industry experts and dedicated career coaching that would help you land your dream job! Learn more about it today.