I spotted two thread this month discussing how little Microsoft is indexing their sites but how often Microsoft's bot, MSNbot, is crawling their sites.
I was just looking through my logs and noticed that msnbot was crawling our site pretty hard, grabbing about 10% of the site in the last half hour or so.
I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.
Billy, as others, wonder if they should just block MSNBot all together, since they feel the traffic they received from Live Search is not worth the stress the bot puts on the server when they crawl.
Let's do some comparisons of Google versus Live Search in site command counts:
- site:www.seroundtable.com - Google 12,800 vs. Live Search 268
- site:www.google.com - Google 1,680,000 vs. Live Search 330
- site:www.flickr.com - Google 71,000,000 vs. Live Search 268
- site:www.webmasterworld.com - Google 446,000 vs. Live Search 357
So either I am doing something wrong or Live Search's site command is wrong, or Live Search forgot how to index pages?
Update: Microsoft sent me a response to this post, which I felt would be great to add.
For webmasters, It is problematic to use the “site:” operator to determine how many pages for a site are included in the Live Search index. The “Site:” operator generates an estimate of the pages in the index. These numbers can vary wildly depending on when you execute the query.
You posed the question about whether users should block MSNbot because traffic from the bot is not worth the stress on your servers. Obviously, we would prefer that customers not block MSNbot, rather customers who are concerned with stress from Live Search crawls should add the crawl-delay parameter to their robots.txt file. This can help reduce the load on your servers and still be a part of the Live Search results. Webmasters can refer to the MSNBot support page for more information on crawl-delay.