GoogleBot Not Sending IF_MODIFIED_SINCE Request?

Oct 9, 2007 • 7:46 am | comments (1) by twitter Google+ | Filed Under Google Search Engine Optimization
 

A WebmasterWorld thread discusses a more detailed issue with how Google's spider, GoogleBot, is crawling some pages. Let me quote the detailed explanation:

I've tried: Checking for the HTTP_IF_MODIFIED_SINCE header and returns "304 Not Modified" if possible.

Problem: Googlebot doesn't always send this header. Even if they already know about a page they doesn't always send the header.

I've tried: Using the expires header to tell google that each page should expire in a month from the request.

Problem: Googlebot keep requesting the pages. They seem to ignore this header.

Brett Tabke, founder of WebmasterWorld, said he noticed these issues as well. jdMorgan, a WebmasterWorld moderator, tried to offer some advice:

Check that the 'expires' header is relative -- Expires after so much time, rather than Expires at a certain time.

You should check your Cache-control server response headers as well.

Is this a Webmaster issue or GoogleBot issue?

Forum discussion at WebmasterWorld.

Previous story: Yahoo Reporting Error: Advertisers Reports Showing Charges in Millions
 

Comments:

Matt Cutts

11/05/2007 10:36 pm

A colleague asked about this a little bit. I think that if we're following a chain of redirects, then we might not send the "If-Modified-Since:" header. I'm not 100% sure, but I don't think we make use of the "Expires:" HTTP header right now. But you might be interested in this post: http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html It mentions how you can use an HTTP header like "X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT", but that's more to remove a page after a certain time.

blog comments powered by Disqus