The reality is potentially further away from this pleasant and inclusive image that many big tech brands present. For example, we’ve seen studies that suggest facial recognition has racial and gender biases, and we already know that virtual assistants such as Alexa respond better to men as well as perpetuate harmful stereotypes that those with feminine voices and names are ‘subservient and eager to please’.
As technology continues to become an integral part of everyone’s lives, it risks eliminating the richness of diversity in our culture. If tech brands continue to produce products that don’t account for people’s differences, the world could look very different just ten years from now.
Come again?
There have been numerous – often comedic – examples of people with different accents or backgrounds frustrated by a Google Assistant or Alexa that doesn't understand them. These candid stories may be entertaining, but they point to something much darker and more serious, particularly when that is a more affecting lived reality for so many people. Alexa not understanding you need milk on your shopping list is one thing, being mistranscribed in court or on a 999 call is another.
The global speech-to-text API market size was worth $1,321.5M in 2019 and is projected to reach ¢3,036.5M by 2027. Despite its significant growth, a lot of speech recognition technology can struggle to transcribe accurately, understand a wide array of languages or account for those with speech impediments and non-native accents.
People still seem to be willing to play the algorithmic game of adapting their voice, tone and style in order to be recognised. In the short term, this may just be a mild frustration, but the long-term consequences can put the future of lesser-spoken languages and accents at risk of extinction.
Accessibility for all
Regardless of ethnicity, gender, age or social background, everybody has the right to be understood. Technology that transcribes speech serves a vital function – not only through comprehension but also in the preservation of culture. Understanding every voice is a way to acknowledge, recognise and preserve cultures for future generations. UNESCO regularly releases the list of languages that range from vulnerable to extinct, showcasing the sheer volume of languages at risk of being forgotten from Scicillian to Yiddish.
Technology has already shown itself to be a huge enabler for the preservation of culture – most recently being used to protect Ukraine’s cultural heritage during the current conflict – and language preservation should be part of this too. Looking closer to home, we can see that despite being the second fastest growing language in the UK, Welsh is not available on most large speech-to-text platforms such as Alexa or Google. Even with the Welsh Government’s ambitions to have 1 million speakers of the language by 2050, if technology is not supporting this ambition, then this goal will likely fail to materialise.
Active inclusion
By knowingly or unknowingly allowing technologies such as speech-to-text platforms to create a single source of truth, we are reshaping what future societies could look like. It is a future determined and established by a fraction of the global population where power, tools and resources are often concentrated.
We often discuss the dangers of monopolies within society, but these conversations should extend beyond assets and services, and also apply to language and culture.
Hope for the future
However, things are starting to get better. For example, tech companies are relying less on single datasets, and instead drawing on a wide range of voices to ensure more people from more communities are understood. Or there are initiatives such as the Speech Accessibility Project, which aims to create a private, de-identified dataset which can be used to better train machine learning models to understand a variety of speech patterns. This type of work will help increase diversity, preserve languages and provide a more inclusive software for all.
Language is the bedrock of culture. It tells a nation’s and people’s history, accounts for its present, and cements its future. Language is a story and is a crucial form of communication. As such, technology – in whatever form – should do its part in sustaining rich cultural nuances. Encouraging the positive work that has already been done will ensure that technology does not leave anyone behind.
John Hughes, Accuracy Team Lead at Speechmatics.