Emerging Technologies

Google has created an AI that sounds indistinguishable from humans

Jan 2, 2018

This article is published in collaboration with Quartz.

Peter Kirn holds two coins between his fingers to create sounds with a synthesizer in Berlin, July 19, 2012. Kirn makes music with an unusual instrument - his own body. The Kentucky native pinches two electrically charged pennies connected to a laptop, measuring the electrical currents of his body and synthesise them into melodic sound. The laptop screen shows a visualisation of the sound he is creating. REUTERS/Thomas Peter (GERMANY - Tags: ENTERTAINMENT SOCIETY)

Google have created an AI and trained it to sound indistinguishable from humans.

Image: REUTERS/Thomas Peter

Dave Gershgorn

Artificial Intelligence Reporter, Quartz

Share:

What's the World Economic Forum doing to accelerate action on Emerging Technologies?

The Big Picture

Explore and monitor how Artificial Intelligence is affecting economies, industries and global issues

Stay up to date:

Artificial Intelligence

Humans have officially given their voice to machines.

A research paper published by Google this month—which has not been peer reviewed—details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text.

The system is Google’s second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet’s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.

Image: Techonomy

You can listen to two samples below. Keep in mind one sample from each sentence is generated by AI, and the other is a human hired by Google. We don’t know for sure which is which. (However, if you reveal the “page source” and look at the filenames of each on the Google research website, one is labeled “gen,” ostensibly to mark the generated sample.)

“George Washington was the first President of the United States.”

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

“That girl did a video about Star Wars lipstick.”

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

The Google researchers also demonstrate that Tacotron 2 can handle hard-to-pronounce words and names, as well as alter the way it enunciates based on punctuation. For instance, capitalized words are stressed, as someone would do when indicating that specific word is an important part of a sentence.

Here’s an example:

“The buses aren’t the problem, they actually provide a solution.”

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

“The buses aren’t the PROBLEM, they actually provide a SOLUTION.”

Accept our marketing cookies to access this content.

These cookies are currently disabled in your browser.

Unlike some core AI research the company does, this technology is immediately useful to Google. WaveNet, first announced in 2016, is now used to generate the voice in Google Assistant. Once readied for production, Tacotron 2 could be an even more powerful addition to the service.

Have you read?

However, the system is only trained to mimic the one female voice; to speak like a male or different female, Google would need to train the system again.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Related topics:

Emerging Technologies Fourth Industrial Revolution

Share:

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

More on Emerging Technologies
See all

The new renewable revolution: Why carbon dioxide removal will transform the carbon market

Michelle You

April 10, 2025

How to make smarter decisions about sustainable plastic production

Ellen de Ruiter

April 10, 2025

Can aviation ever be sustainable? Here are some paths to net zero

21 hours ago

How shared digital infrastructure can bridge the gap in Africa

Nii Simmonds and Obinna Isiadinso

April 9, 2025

What is 'quantum advantage' and how can businesses benefit from it?

Katia Moskvitch

April 9, 2025

How AI is fundamentally changing the operational needs of startups

Marc Alexander Penzel

April 9, 2025