This AI can create a video of Barack Obama saying anything

Jul 21, 2017

The hand of humanoid robot AILA (artificial intelligence lightweight android) operates a switchboard during a demonstration by the German research centre for artificial intelligence at the CeBit computer fair in Hanover March, 5, 2013. The biggest fair of its kind open its doors to the public on March 5 and will run till March 9, 2013. REUTERS/Fabrizio Bensch (GERMANY - Tags: POLITICS) - RTR3ELOG

'Highly realistic' ... a new computer program can create simulations of people speaking

Image: REUTERS/Fabrizio Bensch

Charlotte Edmond

Senior Writer, Forum Agenda

Our Impact

What's the World Economic Forum doing to accelerate action on Emerging Technologies?

The Big Picture

Explore and monitor how Artificial Intelligence is affecting economies, industries and global issues

Stay up to date:

Artificial Intelligence

In news that has made pranksters around the world pay attention, there is now a computer program that can create a realistic simulated video of someone speaking.

Researchers at the University of Washington have proved their point by creating a lip-synced video of former US president Barack Obama that blends existing audio and footage.

Image: REUTERS/Fabrizio Bensch

The program uses artificial intelligence (AI) to match audio of a person speaking with realistic mouth shapes, which it then grafts on to an existing video. After analysing millions of video frames in stock footage, reviewing mouth shapes and sound patterns, the program is able to produce highly realistic simulations.

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

Have you read?

Faking it in the film industry

The researchers say the technology has the potential to be used in special effects. Currently the process for audio-to-video conversion involves filming lots of people saying the same sentence and attempting to find a correlation between sounds and mouth shapes. As well as being tedious and time-consuming, it also creates what is known as the “uncanny valley” problem, where videos are fairly realistic, but not quite realistic enough. Instead of looking convincing, they tend to look creepy.

The technology could also improve the experience on poor-quality video calls and could have an application for hearing-impaired people, allowing them to lip-read video synthesis created from over-the-phone audio.

The team also estimates that by reversing the process – feeding video into the programme instead of just audio – they could potentially develop an algorithm to detect whether a video is real or faked.

The aim is to improve the algorithms to generalize situations and recognize a person’s voice and speech patterns with less data, for example with one hour of video to learn from instead of the current 14 hours.

The program is only capable of creating video from words spoken by the same person: you can’t yet put your words in someone else’s mouth.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.