Google Webmaster Tools Warns Of Spikes in Bandwidth Fees

Jul 22, 2008 • 8:31 am | comments (8) by twitter Google+ | Filed Under Google Search Engine Optimization
 

I have a client with a very large database driven site. The site is extremely crawlable, which makes for a really nice amount of pages for very specific search terms. I cannot share the site I am talking about, because I do not have client approval. But I did want to share a new Google Webmaster Tools message that this client received, that, in a sense, warned the webmaster that Googlebot may "consume much more bandwidth than necessary."

The subject line of the error reads: Googlebot found an extremely high number of URLs on your site

The body of the message reads:

Googlebot encountered problems while crawling your site http://www.domain.com/.

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

More information about this issue Here's a list of sample URLs with potential problems. However, this list may not include all problematic URLs on your site.

Here is a picture of the message: Googlebot Too Many URLs Warning

Google goes on to list 20 or so URLs that they found to be problematic. A few of those URLs are 100% already blocked by the robots.txt file on the site, so I am not sure why they show up. The others, I can see why Google might consider them to be "similar content," but technically, they are very different pieces of content.

In any event, I had two major questions:

(1) Do you think this means Google will trust this site less? I don't think so. (2) To me, this makes me feel that Google is giving us the option of blocking these URLs or Google will simply drop them from the index. Google does this all the time already, dropping what they believe to be duplicate URLs. Why does this require a specific message? Does it mean that Google won't drop them but warns that the crawlers will crawl and your bandwidth will just spike?

I have never really seen a discussion on this specific Webmaster Tools message from Google, so let's start one. Please comment here or join the Search Engine Roundtable Forums thread.

Forum discussion at Search Engine Roundtable Forums.

Previous story: Sadness In The Search Industry : Respect Each Other
 

Comments:

JohnMu

07/22/2008 01:50 pm

I believe this is a fairly new message (but the problem is obviously old -- oh those endless calendar scripts...). The problem is that the Googlebot is wasting your server's resources by crawling URLs which most likely don't need to be crawled. There is a chance that we might be missing better content because of that, so we thought it would be good to let webmasters know about these issues and give them a chance to direct us to greener pastures.

Michael Martinez

07/22/2008 05:31 pm

If some of the URLs really are being blocked by robots.txt, there seems to be a problem either with the robots.txt file or with Googlebot, as Googlebot should not be ignoring a robots.txt exclusion directive.

Barry Schwartz

07/22/2008 05:33 pm

Or there is an issue with the report. We have seen issue after issue with Webmaster Tools reports.

chulian

07/24/2008 10:45 am

this message seems to be quite new. i got this message too. but in my opionion there must be an issue with this. as already mentioned i see the same problems: 1.) urls that are blocked via robots.txt are listed 2.) old urls that respond with 404 are listed just some of the examples makes more or less sense. if these bugs gets solved it could be a good tool to find duplicate content issues.

bill

07/30/2008 01:32 pm

I'm seeing the same issue. I wonder if it has something to with search results being the primary content on a page and non search results being the secondary source of content? Therefore Google is seeing the page as a search result and informing us, in a round about way as usual, that these pages should be blocked in our robots.txt file.

Lemyec

08/01/2008 03:30 pm

Same for us and not many informations in french web. this message point the Duplicate Content issue but what do you think about this sentence : " Googlebot encountered extremely large number of links on your website" ? why links number and duplicate content are associated on this message ? i don't understand and i don't see the relation. Sorry, i could give an example cause of non-disclosure agreement Kevin

Lemyec

08/01/2008 03:42 pm

Hello ! I have received the same message on 28 july. Actually, there is not information in french web about it. But i would like to know your opinion about this sentence in the message : "Googlebot encountered extremely large number of links in your website". What the relation between number of links and duplicate content ? i don't understand ... should i have to reduce the number of my links ? How to know when the pbm will be resolved (i'm not sur google send us a message again to say : all is good :) and with million of pages, how resolving all the duplicate content... Hard work in perspective. Sorry but i can't give my website name cause of non-disclosure agreement. Thks Kevin

Mark

09/06/2011 02:20 pm

Just get the same message- Googlebot encountered problems while crawling your site http://www.seo4.cz/. Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

blog comments powered by Disqus