Google Using HEAD as Opposed to GET Requests More Frequently?

Jul 7, 2006 • 8:19 am | comments (4) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Honestly, I do not personally track Google at the level of looking at my raw log files daily. But a WebmasterWorld thread describes Google recently (i.e. "for months") using HEAD only requests for crawling. Typically, Google uses GET requests when crawling pages, to pull the content down to its index. Google has now been also using HEAD requests to just pull the header data.

This is possibly a method to more efficiently crawl pages. They can (1) make sure the page exists quicker and (2) they can pull the last modified date and other header type information only.

Forum discussion on this new (??) GoogleBot behavior at WebmasterWorld.

Previous story: eBay Forbids Use of Google Checkout on eBay
 

Comments:

SEO Egghead

07/07/2006 05:31 pm

I question the usefulness of this Barry. I believe last modified dates are pretty much defunct these days, since most sites are dynamic and not aware of their last modified dates. PHP will not send a last modified date in its default configuration, as this would be disasterous for several reasons (The timestamp on the page is probably not at all related to the actual age of the content of the page, as well as the fact that page.php?id=X typically houses several thousand pages.

Barry Schwartz

07/07/2006 05:38 pm

Yes that is true. But why wouldn't they store that info, why they are there.

Michael Martinez

07/10/2006 06:31 pm

The apparent upswing in HEAD requests came with the December 2004 update, so far as I have seen in my own log activity. Given that their crawling patterns have changed since Big Daddy, those requests may be proceeding at the same pace as before but now constitute a higher proportion of requests for many sites. The dynamic content issue is a tough one to navigate now because so many people use mod_rewrite or their .htaccess file (or some other method) to assert static URLs for what is actually dynamic content. Maybe Google should invest time in asking people to use and honor a new meta tag: name="static-content" description="no".

JetteroHeller

08/18/2006 07:32 am

I've noticed the same behaviour now cropping up again. Looking through logs of various sites, I've seen Googlebot doing a "HEAD /" on the home page of the site. This then resulted in sites that had 302-redirected home pages showing up in the index, but the "/" page no longer showing up. I saw this in July, but then not again until August 2nd and August 10th where this "HEAD /" pops up again here and there over many of my sites. Now, all of the sudden, 302-redirected home pages are showing up in the index again. Anyone else getting similar phenomena?

blog comments powered by Disqus