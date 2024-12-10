Google has updated its crawler help documentation to add a new section for HTTP caching, which explains how Google's crawlers handle cache control headers. Google also posted a blog post begging us to let Google cache our pages.

Begging might be too much, but Gary Illyes wrote, "Allow us to cache, pretty please" as the first line of the blog post. He then said we allow Google to cache our content today than we did 10 years go. Gary wrote, "the number of requests that can be returned from local caches has decreased: 10 years ago about 0.026% of the total fetches were cacheable, which is already not that impressive; today that number is 0.017%."

Google added an HTTP Caching section to the help document to explain how Google handles cache control headers. Google's crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.

If both ETag and Last-Modified response header fields are present in the HTTP response, Google's crawlers use the ETag value as required by the HTTP standard. For Google's crawlers specifically, we recommend using ETag instead of the Last-Modified header to indicate caching preference as ETag doesn't have date formatting issues. Other HTTP caching directives aren't supported, Google added.

I should add that Google and Bing both have supported ETag at least since 2018.

Google added a bunch more detail to that section but also expanded this section of the page: