Is Microsoft Live Search Crawling More But Indexing Less?

Nov 24, 2008 • 7:42 am | comments (4) by twitter Google+ | Filed Under Bing Search
 

I spotted two thread this month discussing how little Microsoft is indexing their sites but how often Microsoft's bot, MSNbot, is crawling their sites.

A WebmasterWorld thread and DigitalPoint Forums thread has details of the newish behavior from MSNbot. BillyS at WebmasterWorld explains:

I was just looking through my logs and noticed that msnbot was crawling our site pretty hard, grabbing about 10% of the site in the last half hour or so.

I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.

Billy, as others, wonder if they should just block MSNBot all together, since they feel the traffic they received from Live Search is not worth the stress the bot puts on the server when they crawl.

Let's do some comparisons of Google versus Live Search in site command counts:

So either I am doing something wrong or Live Search's site command is wrong, or Live Search forgot how to index pages?

Forum discussion at WebmasterWorld and DigitalPoint Forums.

Update: Microsoft sent me a response to this post, which I felt would be great to add.

For webmasters, It is problematic to use the “site:” operator to determine how many pages for a site are included in the Live Search index. The “Site:” operator generates an estimate of the pages in the index. These numbers can vary wildly depending on when you execute the query.

You posed the question about whether users should block MSNbot because traffic from the bot is not worth the stress on your servers. Obviously, we would prefer that customers not block MSNbot, rather customers who are concerned with stress from Live Search crawls should add the crawl-delay parameter to their robots.txt file. This can help reduce the load on your servers and still be a part of the Live Search results. Webmasters can refer to the MSNBot support page for more information on crawl-delay.

Previous story: Was There a November Yahoo Update?
 

Comments:

gabs

11/24/2008 01:04 pm

live site: command doesn't really work.. Your see repeats on page being listed e.g. #8 and #16 will be the same indexed url..

Saad Kamal

11/24/2008 05:42 pm

I figured that they show a different stat if you get rid of that "www" in front. There is a possibility that they treat the www and the non-www version of the sites as different websites.. :S

Michael Martinez

11/24/2008 05:49 pm

I'm able to see significantly more pages in Live's site searches for the queries you suggest in your article. Perhaps Live is rolling out a new index.

Jaan Kanellis

11/25/2008 02:26 am

They have a different operator than Google: http://blogs.msdn.com/livesearch/archive/2006/10/16/search-macros-linkfromdomain.aspx

blog comments powered by Disqus