Emerging Technologies

DeepMind says it's developed much more realistic computer speech

A technician makes adjustments to the "Inmoov" robot from Russia during the "Robot Ball" scientific exhibition in Moscow May 17, 2014. Picture taken May 17, 2014.

WaveNet focuses on the sound waves being produced as opposed to the language itself. Image: REUTERS/Sergei Karpukhin

Sam Shead
Technology Reporter, Business Insider

Google DeepMind claims to have significantly improved computer-generated speech with its AI technology, paving the way forward for sophisticated talking machines like those seen in sci-fi films like "Her" and "Ex-Machina."

The London-based research lab,acquired by Google in 2014 for a reported £400 million,announced on Thursday that it has developed a talking computer programme called "WaveNet" that halves the quality gap that currently exists between human speech and computer speech.

Although WaveNet sounds more like a human voice than existing artificial voice generators — known as "text-to-speech" (TTS) systems — it requires too much computing power to make it practical, meaning Google won't be integrating it into its products any time soon, according to The Financial Times.

 AI Landscape: Global Quarterly Financing History
Image: CB Insights

Aäron van den Oord, a research scientist, at DeepMind said: "Mimicking realistic speech has always been a major challenge, with state-of-the-art systems, composed of a complicated and long pipeline of modules, still lagging behind real human speech. Our research shows that not only can neural networks learn how to generate speech, but they can already close the gap with human performance by over 50%.

"This is a major breakthrough for text-to-speech systems, with potential uses in everything from smartphones to movies, and we're excited to publish the details for the wider research community to explore."

Unlike existing artificial voice generators, WaveNet focuses on the sound waves being produced as opposed to the language itself. It uses a neural network — a technology that tries to replicate the human brain — to analyse raw waveforms of an audio signal and model speech and other types of audio, including music.

DeepMind published sample audio recordings of WaveNet talking in English and Mandarin and it's easy to see that the audio recordings are an improvement on Google Now, Amazon's Alexa, and Apple's Siri. The company also showed off some of the music that WaveNet has been able to produced after studying solo piano music on YouTube.

Like other AI systems, WaveNet requires vast quantities of existing data to train itself. DeepMind used Google's existing TTS datasets to do this.

DeepMind, which sits under Alphabet, Google's parent company, is best-known for developing artificial intelligence systems that can master games like Space Invaders and Go. However, Google has been slow to integrate the company's technology into its products, with just one data centre efficiency project announced so far, albeit on a global scale.

For more details on WaveNet, take a look at Google DeepMind's academic paper.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Emerging Technologies

Related topics:
Emerging TechnologiesFourth Industrial Revolution
Share:
The Big Picture
Explore and monitor how Artificial Intelligence is affecting economies, industries and global issues
A hand holding a looking glass by a lake
Crowdsource Innovation
Get involved with our crowdsourced digital platform to deliver impact at scale
World Economic Forum logo
Global Agenda

The Agenda Weekly

A weekly update of the most important issues driving the global agenda

Subscribe today

You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.

5 ways to achieve effective cyber resilience

Filipe Beato and Jamie Saunders

November 21, 2024

Why AI is Southeast Asia's new engine for profitable growth

About us

Engage with us

  • Sign in
  • Partner with us
  • Become a member
  • Sign up for our press releases
  • Subscribe to our newsletters
  • Contact us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2024 World Economic Forum