By Paresh Dave
(Reuters) – Translation tools from Alphabet Inc’s Google and other companies could be contributing to significant misunderstanding of legal terms with conflicting meanings such as “enjoin,” according to research due to be presented at an academic workshop on Monday.
Google’s translation software turns an English sentence about a court enjoining violence, or banning it, into one in the Indian language of Kannada that implies the court ordered violence, according to the new study https://www.youtube.com/watch?v=eKRpiMBlu40.
“Enjoin” can refer to either promoting or restraining an action. Mistranslations also arise with other contronyms, or words with contradictory meanings depending on context, including “all over,” “eventual” and “garnish,” the paper said.
Google said machine translation is “is still just a complement to specialized professional translation” and that it is “continually researching improvements, from better handling ambiguous language, to mitigating bias, to making large quality gains for under-resourced languages.”
The study’s findings add to scrutiny of automated translations generated by artificial intelligence software. Researchers previously have found programs that learn translations by studying non-diverse text perpetuate historical gender biases, such as associating “doctor” with “he.”
The new paper raises concerns about a popular method companies use to broaden the vocabulary of their translation software. They translate foreign text into English and then back into the foreign language, aiming to teach the software to associate similar ways of saying the same phrase.
Known as back translation, this process struggles with contronyms, said Vinay Prabhu, chief scientist at authentication startup UnifyID and one of the paper’s authors.
When they translated a sentence about a court enjoining violence into 109 languages supported by Google’s software, most results erred. When spun back to English, 88 back translations said the court called for violence and only 10 properly said the court prohibited it. The remainder generated other issues.
Another researcher, Abubakar Abid, tweeted in December that he found possible bias in back translation through Turkish. Using Google, short phrases with “enjoin” translated to “people” and “Muslims” ordering violence but the “government” and “CIA” outlawing it.
The new paper said translation issues could lead to severe consequences as more businesses use AI to generate or translate legal text. One example in the paper is a news headline about nonlethal domestic violence turning “hit” into “killed” during translation, a potentially true but problematic association.
Authors also expressed concern about the lack of warnings and confidence scores in tools from Google and others. Google in support materials warns it may not have the best solution “for specialized translation in your own fields.”
(Reporting by Paresh Dave in Oakland, Calif.; Editing by Matthew Lewis)