GoogleBot Not Sending IF_MODIFIED_SINCE Request?
A WebmasterWorld thread discusses a more detailed issue with how Google's spider, GoogleBot, is crawling some pages. Let me quote the detailed explanation:
I've tried: Checking for the HTTP_IF_MODIFIED_SINCE header and returns "304 Not Modified" if possible.Problem: Googlebot doesn't always send this header. Even if they already know about a page they doesn't always send the header.
I've tried: Using the expires header to tell google that each page should expire in a month from the request.
Problem: Googlebot keep requesting the pages. They seem to ignore this header.
Brett Tabke, founder of WebmasterWorld, said he noticed these issues as well. jdMorgan, a WebmasterWorld moderator, tried to offer some advice:
Check that the 'expires' header is relative -- Expires after so much time, rather than Expires at a certain time.You should check your Cache-control server response headers as well.
Is this a Webmaster issue or GoogleBot issue?
Forum discussion at WebmasterWorld.
Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!
rustybrick in Google Optimization at October 9, 2007 7:46 AM
Comments (1)

Comments
A colleague asked about this a little bit. I think that if we're following a chain of redirects, then we might not send the "If-Modified-Since:" header. I'm not 100% sure, but I don't think we make use of the "Expires:" HTTP header right now. But you might be interested in this post:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html
It mentions how you can use an HTTP header like "X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT", but that's more to remove a page after a certain time.
Posted by Matt Cutts at November 5, 2007 17:36