How To Tame GoogleBot

Aug 28, 2008 • 8:01 am | comments (8) by twitter Google+ | Filed Under Google Search Engine Optimization
 

A Google Groups thread has a detailed discussion around the topic of Google spider, GoogleBot, crawling too much. Sometimes servers can be overwhelmed by all the traffic it gets and automated crawlers, such as GoogleBot, can add a tremendous amount of stress to a server that is already stressing. Most webmasters are not in the position of banning GoogleBot from accessing their sites, so what can you do?

Here are some of the tips from the thread, including tips from Google representatives:

  • Make sure GoogleBot is really GoogleBot and not some spammer. More on that over here and here.
  • If you have a large site, limit or instruct GoogleBot on what it can or cannot crawl via the robots.txt file.
  • Some URLs might be more "expensive" to be crawled than others (i.e. static pages versus large dynamic and graphic rich pages.
  • Do you have 2 or 3 times the amount of pages indexed by Google, as you have actual product pages on your site? If so, why?
  • Redirect any temporary URLs or tracking URLs using a 301
  • Set the Google Crawl Rate, in Webmaster Tools, more on that over here

Forum discussion at Google Groups.

Previous story: Google Fixes Traffic Estimator Service API Numbers
 

Comments:

ron

08/29/2008 06:20 am

Why should anybody want to "tame" google or any other bot? only maybe bandwidth.If you have a website you want as much publicity and as many people to look at it as possible.people send a fortune in seo just for that reason If this post helps somebody and gets me one more hit I will be happy!

Fred

08/29/2008 08:05 am

There is no point going through the hassle stopping bots on sites. Not like years past, bandwidth are fairly cheap.

James

08/29/2008 09:52 am

It's not so much about bandwidth than it is about sucking up resources such as CPU and MySQL for dynamic sites. Because those resources are limited per user on shared web hosting, bandwidth yes is plentiful on modern web hosting offers and is not an issue.

Jacob

08/29/2008 01:48 pm

If Googlebot is helping my site to be indexed on Google then I am not interested in limiting it. http://www.rexxsales.com

Col

08/29/2008 03:37 pm

I am more than happy to have Google checking out my sites.

No Name

08/29/2008 03:39 pm

Thanks for the article. Helpful in trying to limit the crawling.

No Name

08/29/2008 03:41 pm

I didn't know that having too much og the Googlebot is a bad thing.

PaulBuchanan

03/29/2009 07:09 pm

Good point James.

blog comments powered by Disqus