According to a study from Northeastern University and Pomona College, mistakes in auto-captions let down many video conferencing and social media apps.
Researchers worked with Consumer Reports (CR) to test auto-captions on seven popular platforms such as Zoom, Facebook, Google Meet and YouTube.
Mistakes were found on all of the platforms; however, according to Consumer Reports, some of them were getting around one in 10 words wrong.
The video conferencing platforms that the study looked at were BlueJeans, Cisco Webex, Google Meet, and Zoom.
Consumer Reports also included Microsoft Stream, the companyβs video streaming service, which uses the same voice technology as Microsoft Teams.
The study found that Webex had more mistakes than Google Meet, but within each tested platform, there were considerable differences.
Consumer Reports said Zoomβs βvery best transcription had just two errors per 100 words, while at its worst the software mistranscribed nearly every third wordβ.
Kaveh Waddell, Consumer Reports, said: βWe controlled for the speakerβs age, gender, race and ethnicity, first language, and speech rate. As it turned out, only gender and first language status independently affected the variation in transcription mistakes.
βThough the accuracy differences we found between groups of speakers may not seem large, they can have a real impact on comprehension.
βTake Zoomβs average accuracy gap between a native and non-native speaker: Itβs 3.6 per cent, which looks like a small number.
βBut imagine if you misunderstood three or four extra words out of every hundredβ on top of the roughly eight per cent of words that the auto-captions already bungled, according to our study.
βEnglish is often spoken at about 150 words per minute, so those mistakes can pile up fast.β
When responding to Consumer Reports, Microsoft confirmed their findings to align with its internal testing, which also revealed lower accuracy when transcribing men and second-language English speakers.
A Zoom spokesperson told CR: βWeβreΒ continuously enhancing our transcription feature to improve accuracy toward a variety of factors, including English dialects and accents.β
Google said it was working to βimprove the accuracy of live captions and translations so even more users can participate and stay engaged using Google Meetβ.
Cisco told CR that its auto-caption testing puts Webex ahead of two βbest-in-class speech recognition enginesβ but wouldnβt tell the publication what those products were.
In Consumer Reportβs study, Webex had the highest error rate, prompting a Cisco spokesperson to claim that the discrepancy may be explained by the fact that Webexβs captions are fine-tuned for video conferencing rather than other scenarios.
Β
Β