Chen says that although policies to moderate the content of Facebook, Twitter, and others managed to filter out some of the most prominent misinformation in English, the system misses that content when it is in other languages. This work had to be done by volunteers like his team, who were trained to seek out misinformation and dismantle it and reduce its spread. “These mechanisms for capturing certain words and things don’t necessarily capture that misinformation and misinformation when it’s in another language,” he says.
Google’s translation services and technologies, for example Translatotron and real-time return headphones use artificial intelligence to convert between languages. But Xiong does not have enough of these tools for Hmong, as the context in which the language is very complex is incredibly important. “I think we’ve become very friendly and dependent on advanced systems like Google,” he says. They say it’s “accessible language” and then I read it and it says something completely different. ”
(A Google spokesperson acknowledged that smaller languages have a “more difficult translation task” but that the company has “invested in research that is particularly beneficial in low-resource language translation” using machine learning and community feedback).
All the way down
The language challenges online go beyond the US and literally go to the underlying code. Yudhanjaya Wijeratne is a researcher and data scientist at the Sri Lankan think tank LIRNEasia. In 2018, activities on social media began to monitor bot networks that encouraged violence against Muslims: in February and March of that year, a string of Sinhalese Buddhists clashes targeted Muslims and mosques in the cities of Ampara and Kandy. His team documented The bot’s “hunting logic,” cataloged messages on hundreds of thousands of Sinhalese social networks and took the findings to Twitter and Facebook. “They would say all sorts of nice, well-intentioned things – basically conservative statements,” he says. (In a statement, Twitter says it uses human review and automated systems “to apply our rules impartially to all people in the service, regardless of background, ideology, or political spectrum”)
When contacted by the MIT Technology Review, a Facebook spokesperson said the company had conducted an independent human rights assessment of the platform’s role in the violence in Sri Lanka. Published in May 2020, and made changes after the attacks, including hiring moderating content that speaks Sinhalese and Tamil. “We implemented proactive hate speech detection technology in Sinhalese to help identify content that may be violated more quickly and effectively,” they said.
As Bot’s behavior continued, Wijeratne became skeptical of the dish. He decided to study the code libraries and software tools used by companies, and decided that mechanisms to control hate speech in most languages other than English were not yet built.
“A large part of the research, in fact, has not yet been done for many languages like ours,” Wijeratne says. “What I can do with three lines of code in English in Python, it took me literally two years to study 28 million Sinhalese words to build basic corpora, build basic tools, and then take that level where I could. I might do that level of text analysis.”
After suicide bombers targeted churches in Colombo (the capital of Sri Lanka), in April 2019, Wijeratne built a tool to study hate speech and misinformation in Sinhala and Tamil. System, called Watchdog, is a free mobile app that combines news and attaches notes to fake stories. The warnings come from volunteers trained in data verification.
Wijeratne stressed that this work goes beyond translation.
“Many of the algorithms that are frequently mentioned in research, especially in natural language processing, show excellent results for English,” he says. “And yet many of the same algorithms, even if used in languages with few degrees of difference (be it West German or a Romantic tree of languages), can give completely different results.”
Natural language processing is the basis of automated content moderation systems. Wijeratne has published a paper In 2019, discrepancies between their accuracy in different languages were examined. He says the more computational resources there are for a language, such as datasets and web pages, the better the algorithms can work. The languages of the poorest countries or communities are disadvantages.
“If you’re building the Empire State Building for English, you have plans. You have the materials, ”he says. “You have everything at hand and all you have to do is put this together. For every other language, you don’t have plans.
“You have no idea where the concrete will come from. You have no steel and no staff. So you’re going to be sitting there touching a brick at the same time and maybe your grandchildren or your grandchildren can complete the project. ”
The movement to deliver these shots is known as language justice, and is not new. American Bar Association language describes justice as a “framework” that preserves people’s rights “to communicate, understand and comprehend, and in the language they feel most articulate and powerful”.