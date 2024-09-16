Google has completely reorganized its crawlers and user-triggered fetchers documentation. It used to be all on one page and now it is in several pages. Most of the changes were just moving content around but Google did add sections for what product each crawler affects, and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens.

Google wrote, "The documentation grew very long which limited our ability to extend the content about our crawlers and user-triggered fetchers," so that is why they redid it.

If you dig into each crawl, for example Googlebot you will see two new sections:

(1) Affected products

(2) Example robots.txt group

Here is a sample screenshot but what I highlighted in red was added for every crawler:

Here is how each crawler affects products:

Googlebot: Crawling preferences addressed to the Googlebot user agent affect Google Search (including Discover and all Google Search features), as well as other products such as Google Images, Google Video, Google News, and Discover.

Googlebot Image: Crawling preferences addressed to the Googlebot-Image user agent affect Google Images, Discover, Google Video, and all features in Google Search where images, logos, and favicons are presented.

Googlebot Video: Crawling preferences addressed to the Googlebot-Video user agent affect video-related Google Search features and other products dependent on videos.

Googlebot News: Crawling preferences addressed to the Googlebot-News user agent affect all surfaces of Google News (for example, the News tab in Google Search and the Google News app).

Google StoreBot: Crawling preferences addressed to the Storebot-Google user agent affect all surfaces of Google Shopping (for example, the Shopping tab in Google Search and Google Shopping).

Google-InspectionTool: Crawling preferences addressed to the Storebot-Google user agent affect Search testing tools such as the Rich Result Test and URL inspection in Search Console. It has no effect on Google Search or other products.

GoogleOther: Crawling preferences addressed to the GoogleOther user agent don't affect any specific product. GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. It has no effect on Google Search or other products.

GoogleOther-Image: Crawling preferences addressed to the GoogleOther-Image user agent don't affect any specific product, similar to GoogleOther. GoogleOther-Image is the version of GoogleOther optimized for fetching publicly accessible image URLs.

GoogleOther-Video: Crawling preferences addressed to the GoogleOther-Video user agent don't affect any specific product, similar to GoogleOther. GoogleOther-Video is the version of GoogleOther optimized for fetching publicly accessible video URLs.

Google-CloudVertexBot: Crawling preferences addressed to the Google-CloudVertexBot user agent affect crawls requested by the site owners' for building Vertex AI Agents. It has no effect on Google Search or other products.

Google-Extended: Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products. Google-Extended does not impact a site's inclusion or ranking in Google Search.

APIs-Google: Crawling preferences addressed to the APIs-Google user agent affect the delivery of push notification messages by Google APIs.

AdsBot Mobile Web: Crawling preferences addressed to the AdsBot-Google-Mobile user agent affect Google Ads' ability to check web page ad quality.

AdsBot: Crawling preferences addressed to the AdsBot-Google user agent affect Google Ads' ability to check web page ad quality.

AdSense: Crawling preferences addressed to the Mediapartners-Google user agent affect Google AdSense. The AdSense crawler visits participating sites in order to provide them with relevant ads.

Google-Safety: The Google-Safety user agent handles abuse-specific crawling, such as malware discovery for publicly posted links on Google properties. As such it's unaffected by crawling preferences.

Feedfetcher: Feedfetcher is used for crawling RSS or Atom feeds for Google News and PubSubHubbub.

Google Publisher Center: Google Publisher Center fetches and processes feeds that publishers explicitly supplied for use in Google News landing pages.

Google Read Aloud: Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).

Google Site Verifier: Google Site Verifier fetches Search Console verification tokens.

Google wrote:

Reorganized the documentation for Google's crawlers and user-triggered fetchers. We also added explicit notes about what product each crawler affects, and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens. There were no meaningful changes to the content otherwise.

Google also added information about the content encodings (compressions) supported by Google's crawlers and user-triggered fetchers. This is just documentation change, no change in behavior. Google also updated the URL in the GoogleProducer HTTP user agent string in the documentation for Google's user-triggered fetchers to match the value used by the actual fetcher.

