There are four major areas where this technology is becoming prevalent: 1) Imagery, 2) Video, 3) Audio, 4) Writing. All
of these personally identifiable characteristics can be replicated by this technology.
Video/Imagery
The technology for creating video and imagery of real people is relatively simple to use. Anyone with a reasonably
powerful GPU can feed a few hundred images or videos of a target into a machine learning programme called Generative
Adversarial Network (GAN) to create realistic fake imagery or video of the individual. Minimum computing skills are
required.
DeepFakes require cooperation between multiple technologies, most notably machine learning and neural networks. However,
the specific technology that enables the creation of hyper-realistic fake content is called Generative Adversarial
Networks (GAN). GANs are a class of machine learning algorithms used in unsupervised learning (a branch of machine
learning that does not require labeled data) and implemented by a system of neural networks contesting with each other.
It was created by researcher Ian Goodfellow in 2014. Goodfellow now works at Google.
The original technique was capable of generating photographs that look authentic to human observers. In 2018,
researchers at Stanford University further developed a technology called deep video portraits, which allows a computer
to transpose the expressions or actions of one person onto another person, for example making someone in a video smile
or frown.
At the same time, researchers at the University of California, Berkeley used GANs to develop software that could
transpose whole actions from one person to another. A person who can’t dance could suddenly do ballet; someone who
couldn’t speak sign language can appear in a video signing perfectly.
Again in 2018, technology firm Nvidia announced that it had developed a way to create hyper-realistic imagery of
non-existent humans, using what they call a “style-based generator.”
Audio
Significant advances were made in 2018, most notably by Google, in creating human-sounding voice AI. These AIs can act
as your digital assistant, liaising with people on your behalf, without anyone knowing that it is a machine.
A number of companies have made significant advances in generating human voice outputs. There are many reasonable use
cases, most notably in digital assistants like Siri or Alexa, where the user feels more comfortable engaging with a
human voice rather than what is distinctively a robot. People engage and develop loyalty to the personality imbued in
the voice itself.
In 2016, Adobe, the creator of Photoshop, launched a new voice editing tool called VoCo. The tool needs just 20 minutes
of voice data to recreate any message in that person's voice. In effect, that means that it can generate any voice
message in the voice of prominent politicians, business people, journalists, celebrities, etc.
The company Lyrebird have taken this even further by allowing you to create your own voices. This is useful for people
who want to create chatbots, personalise audiobook readers or maintain voice privacy online, among others.
In 2015, a company called Replika (originally Luka) developed a service that uses voice data and AI to recreate loved
ones after they die. The AI uses voice samples and data inputs from e-mails and videos of the deceased person to have
conversations in their voice after death. The company recently pivoted towards creating digital companions.
There are many innovators in the space, but undoubtedly it is Google that have been working at it the longest and have
made the most meaningful advances. Google have long held a belief that voice AI is the significant chasm that needs to
be crossed for AI to really take hold of users’ lives. As far back as 2007, Google launched a directory enquiry service
called GOO411, a free tool for connecting people to phone numbers they needed. This allowed Google to analyse enormous
amounts of voice patterns and ultimately led to the creation of the Pixel phone and its voice capabilities. This
resulted in controversy, when it became clear that Android phones were recording the conversations of people close to
the device (you can delete some of these conversations in your account settings).
Google have now moved a step further by combining their digital assistant capabilities with their voice technology to
generate realistic human voice AI. The product is called Duplex and is already live. You may well be receiving calls
from robots who sound like people hoping to set up a meeting with you.
Writing
a small sample of a person's handwriting can produce a never-ending script of text in perfect replica. Signatures are
now easily fabricated. Natural Language Processing (NLP) - the science of interaction between humans and computers
through natural languages - in combination with these neural networks can now create chatbots that mimic the writing
style and logic of an individual.
Companies like Adobe have been providing real digital signatures for some time now, but they have traditionally required
you to generate it yourself and input the data through a photograph. In 2016, scientists at University College London
developed software that could mimic a person's handwriting perfectly. This would allow users to increase the
personalisation of their correspondence or work by typing in their own handwriting.
In parallel, Chicago-based company Narrative Science are developing and using NLP tools that can create text in the tone
of voice of a company or an author. This may be the only way you ever get to read an ending to the Game of Thrones
series.
Earlier this year, researchers at OpenAI decided against releasing a chatbot due to concerns about how dangerous it may
be in the wrong hands. The AI can generate coherent paragraphs of text indistinguishable from human-created content. It
could also perform rudimentary reading comprehension, translation, question-answering and summarisation without the need
for specific training. These AIs can reasonably be expected to replace many content-creating and reporting roles in the
next decade.
New and emerging technology ecosystems in imagery, audio, text, social and artificial intelligence have lowered the
barrier to entry and enabled the creation and distribution of DeepFakes at scale. Source: L’Atelier BNP Paribas.
Blockchain
Amber Authenticate is software that runs on a device capturing content. The software creates hashes that are logged on the Ethereum blockchain. If anyone interferes with the video, the hashes will not correspond to the record on the blockchain, thereby alerting users to the editing. This technology could be particularly useful for policing, where footage of police brutality can spark wide-scale protests and riots. The risk that someone could create a fake video or edit footage to remove a real incident or insert a fabricated event into a video would be mitigated if the police were running this software on the device.
Image Fidelity
Software scans the image or footage to ensure that it hasn’t been edited. This process assesses elements like light, shade or reflections to ensure that the photo is consistent with the laws of physics. It also checks how many times a file has been compressed to evaluate how many times it has been saved. This methodology is likely to identify only rudimentary fakes.
Corroboration Techniques
Technologies are being developed to assess the fidelity of the image/video against metadata and other images or video taken around the same time. This will involve checking information about the weather, lighting, cloud patterns, etc. at the time of image capture to verify that the video hasn’t been doctored. It is then assessed against other footage taken in the vicinity at the time to ensure there are no discrepancies. This process of interrogation is thorough and has existed for years, but it is not feasible in all cases due to meta factors, such as an indoor location with no other participants. This technique is also hard to scale since it is still massively inefficient to assess every image or video fast enough to mitigate its effects.
Target Assessment
A team of academics at the University of Albany New York realised that there is something consistently off about DeepFake videos - the subjects don’t blink. This made sense - the data being fed to the neural networks rarely contains images of people blinking, and consequently the subjects rarely, if ever, close their eyes.
This prompted new approaches that focus on assessing the target of the image to ensure that the human characteristics they exhibit are consistent with reasonable expectations. This includes assessing skin tone, perspiration, breathing, blinking, heart rate, etc. While this has proven to be the most effective method for spotting fakes to date, the fakers are simultaneously integrating these protocols into newer versions, which are getting harder to spot.
US Defence agency DARPA have invested significant funding in video analysis to spot fakes. MediFor is a DARPA program that is developing technologies to automatically assess the integrity of images or videos. Its ultimate goal is to integrate these technologies in an end-to-end media forensics platform that automatically detects manipulations, provides detailed information about how these manipulations were performed, and facilitates decisions about the use of any questionable image or video. Its current technology assesses information like screen brightness, the subject’s heart rate, breathing and colouring, etc. These are often invisible traits to the human eye but perceptible to a computer. Again, the fakers are course correcting and improving the fakes to stay ahead.
Targeting Bots
A number of tools and techniques have been developed to focus on the distribution part of the value chain, especially on social networks. Since 2016, Facebook has improved its account authentication, hired a new security team and eliminated its trending news feature, among others, to mixed success. In 2018, Twitter conducted a “bot sweep,” a mass deletion of automated bot accounts, and introduced new rules requiring bot developers to undergo a comprehensive vetting process before they can gain access to Twitter’s API. Multiple tools have also been created to spot bots, including "Botornot" - an online service to help users identify bots - and Factcheck.me, which tracks bot activity, amplified images and viral links. The measures have removed millions of bot accounts and deterred malicious bot traffic, but tend to be short-lived as botnets have shifted to more sophisticated tactics using granular meta data.
Whilst the blockchain provenance and integrity style solutions will be useful for instances of litigation, they fall well short of being able to correct the spread of fake news and fake content in the long run. Most technical measures tend to be reactive, focusing on mitigation rather than prevention. In the absence of comprehensive, end-to-end solutions, technical counter measures tend to become less effective with time, as DeepFake creation and distribution technology and strategies evolve and adjust. This is likely to continue to be the case at least in the short term; corroboration of any statement will always take longer than simply declaring it.