Is Google Okay With Minor Tweaks To Machine Translations? via @sejournal, @martinibuster
Google answers whether machine-translated content with only minor editing changes is still good enough. The post Is Google Okay With Minor Tweaks To Machine Translations? appeared first on Search Engine Journal.
Google’s October SEO office-hours answered whether it was okay to use automatically translated content that has been reviewed by a human and subjected to only minor editing changes.
Google SEO-Office Hours Episode
This episode of the Office-Hours hangout follows a new format where questions are submitted in writing and answers are subsequently given.
The Googlers answering the questions are Lizzi Sassman and John Mueller.
Lizzi Sassman (@okaylizzi) is a tech writer who “cares for” the Google Site Central documentation.
Unlike the previous live format, where the audience asked questions in real-time, this new format precludes the opportunity to ask follow-up questions.
This results in answers that closely echo Google’s documentation, with no chance of asking for clarification.
The person asking the question was concerned about their content that was machine translated from another language.
They employed human editors to review the content that was regularly found to be acceptable, requiring only minor changes.
Naturally, the person asking the question is concerned about whether “minor tweaks” is enough to make the content acceptable for Google.
Lizzi Sassman answered the question with an answer the closely followed Google’s documentation.
Arguably, the answer could have been clarified with a follow-up question to determine if “minor tweaks” are good enough.
After all, the question asks explicitly if minor tweaks are good enough for Google.
Possibly the implied answer is to use your judgment about the quality of the translated content.
Judge for yourself.
Is Moderately Edited Machine Translated Content Acceptable?
They asked:
“A site uses machine translation to offer posts in other languages.
The content is reviewed by human translators and they’re often happy with the quality after the minor tweaks.
Is this okay for Google?”
Google’s Lizzi Sassman answered:
“Well that’s good to hear that the human translators are happy and this is totally fine for Google as long as there’s a human involved in the review process. That’s the key.
The thing you want to watch out for is making sure that the quality continues to be good and working well for the humans that are reading the content.”
The answer doesn’t specifically say if minor edits are fine, only that if the “human translators” are good with it, then it should be good for Google.
Could it be that Google doesn’t check if the content is machine-translated but relies on standard content quality signals?
We don’t know.
The new Office-Hours format does not provide the person asking the question an opportunity to ask a follow-up question.
Google Spam Policies
Google’s developer documentation about spammy content mentions automated text translation tools and explicitly says that it is spam except when there is a human element involved.
This is what Google’s documentation states:
“Examples of spammy auto-generated content include:
Text translated by an automated tool without human review or curation before publishing”
So it’s clear from Google’s published guidelines that as long as a human is editing the machine-translated content, Google will be okay with it.
Additionally, in a Google Office-Hours video from April 2022, John Mueller mentioned how AI-generated content is considered spam and then mentioned auto-translated content.
Mueller spoke about AI content-generating tools and compared them to auto-translation tools.
At the 24:55 minute mark of the April 2022 Office-Hours video, Mueller said:
“I think, I don’t know, over time, maybe this is something that will evolve, in that it will become more of a tool for people.
Kind of like you would use machine translation as a basis for creating a translated version of a website.
But you still… essentially work through it manually.”
Why Should A Human Check Auto-Translated Content?
As mentioned above, Google’s concern is that the content referred to from the search engine results pages (SERPs) is high quality and that users will be happy with it.
Something that wasn’t discussed is that translated content contains signatures that a translation detection algorithm can identify.
Detecting machine-translated content is something that’s been researched for many years.
A research paper from 2021 (Machine Translated Text Detection Through Text Similarity with Round-Trip Translation – PDF download) states that content that is translated from one language to another can be complex for humans to detect.
For example, using 100 translated texts, human raters could only identify just over half of the translated texts.
The researchers noted:
“The average accuracy was 53.3% (55.0% for the native speakers and 52.0% for the nonnative speakers), which was close to random.”
The approach, called Text Similarity With Round-Trip Translation (TSRT), outperformed the human raters and scored higher than the state-of-the-art translation detectors when the paper was published in 2021.
Remarkably, this technique can detect the original language of the translated texts.
It is also able to determine which translation algorithm did the translation.
They reported:
“The evaluation results show that TSRT outperforms other methods, with an accuracy of up to 90.2%.
Moreover, TSRT could also identify the original translator and translation language with 93.3% and 85.6% of accuracy, respectively.”
It’s unclear if Google can detect translated content and whether or not Google is even trying to detect translated content.
But we do know that technology to detect it exists. The technology can detect translated content better than humans and determine which translation algorithm did the translation.
If the fact that it’s against Webmaster Guidelines and may have a negative user experience is not enough to motivate editing machine-translated content, then perhaps the possibility that Google is analyzing content quality for machine translation may be a reason to give that kind of content a comprehensive review.
Citation
Listen to the Google Office Hours hangout at the 17:50 minute mark.
Featured image by Shutterstock/g_tech