Emerging Technologies

This computer isn't perfect. So it understands speech as well as you

An employee types on a computer keyboard with both Latin and Cyrillic letters in Sofia June 23, 2008. Bulgaria applied on Monday to register an Internet domain name in Cyrillic script as part of efforts to boost national pride amid a growing influence of Engli

Twenty years ago, the error rate of the best published research system had a word error rate above 43%. Image: REUTERS

Keith Breene
Senior Writer, Forum Agenda

In a significant breakthrough for artificial intelligence, voice recognition software can now understand language as accurately as humans, although grasping the context behind it remains elusive.

Researchers at Microsoft have created software that has a word error rate of 5.9%, which is about the same as a human transcriber.

“The research milestone doesn’t mean the computer recognized every word perfectly. In fact, humans don’t do that, either. Instead, it means that the error rate – or the rate at which the computer misheard a word like have for is or a for the – is the same as you’d expect from a person hearing the same conversation,” Microsoft said in a blog post.

The result has been edging closer for many years and comes just weeks after the same team reported that they had got the error rate down to a tantalising 6.3%.

Twenty years ago, the error rate of the best published research system had a word error rate above 43%.

Neural networks

Both IBM and Microsoft cite the advent of deep neural networks, which are inspired by the biological processes of the brain, as a key reason for advances in speech recognition.

Computer scientists have for decades been trying to train computer systems to do things like recognize images and comprehend speech, but until recently those systems were plagued with inaccuracies.

The new Microsoft programme relies on these deep neural networks as well as specialized graphics processing units that allow the software to learn at speeds not previously possible.

Have you read?

Booming market

The milestone has far-reaching implications.

Recent research by Tractica forecast that voice recognition software licenses will pass 550 million worldwide by 2024. Consumer and healthcare uses are the strongest growth sectors but the technology has implications across multiple industries.

Annual Voice and Speech Recognition licences by region, 2015-2024

More to do

Researchers say more work is needed to improve the system in real-life settings, such as places where there is a lot of background noise. Research into identifying individual speakers when multiple people are talking is also a part of longer-term research efforts.

And, as anyone who has spent time shouting at Siri, Cortana or Google Assistant will testify, there is still a lot of work needed to enable computers to not just understand which words are being spoken, but their meaning and context too. It will still be some time before computers can answer questions or follow instructions with the same accuracy as humans.

Harry Shum, who heads the Microsoft Artificial Intelligence and Research group, “It will be much longer, much further down the road until computers can understand the real meaning of what’s being said or shown.”

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Emerging Technologies

Related topics:
Emerging TechnologiesFourth Industrial Revolution
Share:
The Big Picture
Explore and monitor how Fourth Industrial Revolution is affecting economies, industries and global issues
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

Here’s why it’s important to build long-term cryptographic resilience

Michele Mosca and Donna Dodson

December 20, 2024

How digital platforms and AI are empowering individual investors

About us

Engage with us

  • Sign in
  • Partner with us
  • Become a member
  • Sign up for our press releases
  • Subscribe to our newsletters
  • Contact us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2024 World Economic Forum