This AI can create a video of Barack Obama saying anything
![The hand of humanoid robot AILA (artificial intelligence lightweight android) operates a switchboard during a demonstration by the German research centre for artificial intelligence at the CeBit computer fair in Hanover March, 5, 2013. The biggest fair of its kind open its doors to the public on March 5 and will run till March 9, 2013. REUTERS/Fabrizio Bensch (GERMANY - Tags: POLITICS) - RTR3ELOG](https://assets.weforum.org/article/image/large_IhLlaMoiHFOl3CfNyY8LprWU5iBygdnGo023i2_xGmk.jpg)
'Highly realistic' ... a new computer program can create simulations of people speaking Image: REUTERS/Fabrizio Bensch
![A hand holding a looking glass by a lake](/uplink.jpg)
Get involved with our crowdsourced digital platform to deliver impact at scale
Stay up to date:
Artificial Intelligence
In news that has made pranksters around the world pay attention, there is now a computer program that can create a realistic simulated video of someone speaking.
Researchers at the University of Washington have proved their point by creating a lip-synced video of former US president Barack Obama that blends existing audio and footage.
![](https://assets.weforum.org/editor/Y16SyV8So7H3qiNGWnk7iNSTsaqus9hRjs0Rt04QfkQ.jpg)
The program uses artificial intelligence (AI) to match audio of a person speaking with realistic mouth shapes, which it then grafts on to an existing video. After analysing millions of video frames in stock footage, reviewing mouth shapes and sound patterns, the program is able to produce highly realistic simulations.
Faking it in the film industry
The researchers say the technology has the potential to be used in special effects. Currently the process for audio-to-video conversion involves filming lots of people saying the same sentence and attempting to find a correlation between sounds and mouth shapes. As well as being tedious and time-consuming, it also creates what is known as the “uncanny valley” problem, where videos are fairly realistic, but not quite realistic enough. Instead of looking convincing, they tend to look creepy.
The technology could also improve the experience on poor-quality video calls and could have an application for hearing-impaired people, allowing them to lip-read video synthesis created from over-the-phone audio.
The team also estimates that by reversing the process – feeding video into the programme instead of just audio – they could potentially develop an algorithm to detect whether a video is real or faked.
The aim is to improve the algorithms to generalize situations and recognize a person’s voice and speech patterns with less data, for example with one hour of video to learn from instead of the current 14 hours.
The program is only capable of creating video from words spoken by the same person: you can’t yet put your words in someone else’s mouth.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Related topics:
The Agenda Weekly
A weekly update of the most important issues driving the global agenda
You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.
More on Emerging TechnologiesSee all
Eleni Kemene, Bart Valkhof and Thapelo Tladi
July 22, 2024
Henry Ajder
July 19, 2024
Sebastian Buckup
July 18, 2024
Nii Simmonds and Ayodele Okeowo
July 17, 2024
Jerome Desbonnet and Oded Vanunu
July 16, 2024
Tariq Malik and Prerna Saxena
July 12, 2024