Emerging Technologies

This AI can create a video of Barack Obama saying anything

The hand of humanoid robot AILA (artificial intelligence lightweight android) operates a switchboard during a demonstration by the German research centre for artificial intelligence at the CeBit computer fair in Hanover March, 5, 2013. The biggest fair of its kind open its doors to the public on March 5 and will run till March 9, 2013.  REUTERS/Fabrizio Bensch (GERMANY - Tags: POLITICS) - RTR3ELOG

'Highly realistic' ... a new computer program can create simulations of people speaking Image: REUTERS/Fabrizio Bensch

Charlotte Edmond
Senior Writer, Forum Agenda

In news that has made pranksters around the world pay attention, there is now a computer program that can create a realistic simulated video of someone speaking.

Researchers at the University of Washington have proved their point by creating a lip-synced video of former US president Barack Obama that blends existing audio and footage.

Image: REUTERS/Fabrizio Bensch

The program uses artificial intelligence (AI) to match audio of a person speaking with realistic mouth shapes, which it then grafts on to an existing video. After analysing millions of video frames in stock footage, reviewing mouth shapes and sound patterns, the program is able to produce highly realistic simulations.

Loading...
Have you read?

Faking it in the film industry

The researchers say the technology has the potential to be used in special effects. Currently the process for audio-to-video conversion involves filming lots of people saying the same sentence and attempting to find a correlation between sounds and mouth shapes. As well as being tedious and time-consuming, it also creates what is known as the “uncanny valley” problem, where videos are fairly realistic, but not quite realistic enough. Instead of looking convincing, they tend to look creepy.

The technology could also improve the experience on poor-quality video calls and could have an application for hearing-impaired people, allowing them to lip-read video synthesis created from over-the-phone audio.

The team also estimates that by reversing the process – feeding video into the programme instead of just audio – they could potentially develop an algorithm to detect whether a video is real or faked.

The aim is to improve the algorithms to generalize situations and recognize a person’s voice and speech patterns with less data, for example with one hour of video to learn from instead of the current 14 hours.

The program is only capable of creating video from words spoken by the same person: you can’t yet put your words in someone else’s mouth.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Artificial Intelligence

Related topics:
Emerging TechnologiesFourth Industrial Revolution
Share:
The Big Picture
Explore and monitor how Artificial Intelligence is affecting economies, industries and global issues
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

Here’s why it’s important to build long-term cryptographic resilience

Michele Mosca and Donna Dodson

December 20, 2024

How digital platforms and AI are empowering individual investors

About us

Engage with us

  • Sign in
  • Partner with us
  • Become a member
  • Sign up for our press releases
  • Subscribe to our newsletters
  • Contact us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2024 World Economic Forum