Amazon Takes Top Prize at Interspeech 2020

Tackling deep noise suppression in a society now dominated by virtual meetings

2
AWS-Interpseech-2020-winners
Collaboration

Published: October 14, 2020

Ian Taylor Editor

Ian Taylor

Editor

According to Arvindh Krishnaswamy, Principal Scientist & Senior Manager, AWS Audio/Video Technology, and Services, a team of Amazon scientists recently took the top prize for its non-real-time speech suppression system at Interspeech 2020. Its real-time system finished third overall and second among real-time systems whilst using merely four percent of a CPU core. In a blog post, Krishnaswamy further shared some context, noting:

“In electronic voice communication, noise and reverberation not only hurt intelligibility but also cause listener fatigue through the effort required to understand poor-quality speech for long periods of time”

He added, as we spend more time in remote meetings during the novel Coronavirus period, the issue is more relevant and prevalent than ever before. The Deep Noise Suppression Challenge, held at this year’s Interspeech conference was an attempt at solving that very issue. To meet real-world requirements, Krishnaswamy said the AWS team restricted its real-time entry to just four percent of CPU use (measured on an i7-8565U core.) “This is far less than the maximum allowed by the competition, yet our real-time entry finished very close (0.03 mean opinion score) to first place, beating the rest of the non-real-time entries as well.”

AWS launched the technology which won the Interspeech competition. It is now available for AWS customers who use the Amazon Chime Apple macOS and Microsoft Windows clients for video conferencing. The feature is not only practical, but it is free to try for thirty-days with an Amazon Chime Pro trial. “Classical speech enhancement algorithms use hand-tuned models of speech and noise, generally assuming that noise is constant,” he added.

In theory, this might work ‘ok’ for certain noises like cars in environments ‘that aren’t too noisy or reverberant.’ He added, it unfortunately often fails when it comes to non-stationary noises such as the clicking of keys on a keyboard, leading researchers to turn to deep-learning.

“Speech enhancement requires not only extracting the original speech from the noise and reverberation but doing so in a way that the human ear perceives as natural and pleasant. This makes automated regression testing difficult and complicates the design of deep-learning speech enhancement systems”

AWS’s real-time system is said to take advantage of ‘human-perception’ by optimizing the perceptual characteristics (spectral envelope and voicing) of the speech. It does all this while disregarding all other perceptually immaterial characteristics. What happens as a result – the algorithm can produce breakthrough speech quality and maintain efficiency.

Many companies, including Cisco, are tackling the issue of background noise that often goes noticed during virtual meetings. Cisco seeks to do this very thing with the acquisition of BabbleLabs. Noise suppression technology is already used by its competitors Google and Microsoft. Many physical devices exist that also attempt to solve this real-world issue, including headsets that cancel noise. With all these tools readily available, it may be merely a matter of time before we see the elimination of background noise thanks to the use of advanced technologies like artificial intelligence and machine learning.

 

 

Artificial IntelligenceCustomer ExperienceDigital TransformationUser Experience
Featured

Share This Post