Google Reveals Two New Web Crawlers via @sejournal, @martinibuster
Google announces details of two new crawlers that are optimized for scraping images and videos for research and development purposes The post Google Reveals Two New Web Crawlers appeared first on Search Engine Journal.
Google announces two new crawlers that are for scraping images and videos for research and development purposes
Google revealed details of two new crawlers that are optimized for scraping image and video content for “research and development” purposes. Although the documentation doesn’t explicitly say so, it’s presumed that there is no impact in ranking should publishers decide to block the new crawlers.
It should be noted that the data scraped by these crawlers are not explicitly for AI training data, that’s what the Google-Extended crawler is for.
GoogleOther Crawlers
The two new crawlers are versions of Google’s GoogleOther crawler that was launched in April 2023. The original GoogleOther crawler was also designated for use by Google product teams for research and development in what is described as one-off crawls, the description of which offers clues about what the new GoogleOther variants will be used for.
The purpose of the original GoogleOther crawler is officially described as:
“GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”
Two GoogleOther Variants
There are two new GoogleOther crawlers:
GoogleOther-Image GoogleOther-VideoThe new variants are for crawling binary data, which is data that’s not text. HTML data is generally referred to as text files, ASCII or Unicode files. If it can be viewed in a text file then it’s a text file/ASCII/Unicode file. Binary files are files that can’t be open in a text viewer app, files like image, audio, and video.
The new GoogleOther variants are for image and video content. Google lists user agent tokens for both of the new crawlers which can be used in a robots.txt for blocking the new crawlers.
1. GoogleOther-Image
User agent tokens:
GoogleOther-Image GoogleOtherFull user agent string:
GoogleOther-Image/1.0
2. GoogleOther-Video
User agent tokens:
GoogleOther-Video GoogleOtherFull user agent string:
GoogleOther-Video/1.0
Newly Updated GoogleOther User Agent Strings
Google also updated the GoogleOther user agent strings for the regular GoogleOther crawler. For blocking purposes you can continue using the same user agent token as before (GoogleOther). The new Users Agent Strings are just the data sent to servers to identify the full description of the crawlers, in particular the technology used. In this case the technology used is Chrome, with the model number periodically updated to reflect which version is used (W.X.Y.Z is a Chrome version number placeholder in the example listed below)
The full list of GoogleOther user agent strings:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; GoogleOther) Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleOther) Chrome/W.X.Y.Z Safari/537.36GoogleOther Family Of Bots
These new bots may from time to time show up in your server logs and this information will help in identifying them as genuine Google crawlers and will help publishers who may want to opt out of having their images and videos scraped for research and development purposes.
Read the updated Google crawler documentation
Featured Image by Shutterstock/ColorMaker
SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com
I have 25 years hands-on experience in SEO and have kept on top of the evolution of search every step ...
Subscribe To Our Newsletter.
Conquer your day with daily search marketing news.