Rumorville, conspiracy theory, and over speculation results in crazed forum discussion and a possible slashdot mentioning. Jason Dowdell, someone I have spoken with often for about a year now, published a blog entry named Microsoft Crawling Google Results For New Search Engine?, which caused major forum craze. Jason's blog entry was republished at WebProNews and also reprinted in the WebProWorld forum. I decided to hold off on mentioning it until hearing more from MSN and some of the other experts in the field.
My first impressions were that (1) it was not true, (2) if it was, it would be extremely unethical, (3) they have no reason to go that path, (4) if someone found out, it would ruin them in the short term, (5) and there might be some legal issues (not that I know for sure).
Of course, many of the forums are discussing this topic. Over at WebmasterWorld, the official MSN representative said the following in respond to these rumors; "Also regarding relevance, there has been some speculation on some online forums about MSNBot using Google search result pages to build our index. Let us set the record straight – that is simply not true. We respect robots.txt and as a result we will not crawl Google’s search result pages." Which GoogleGuy (the official Google representative) responds to "Hey msndude, thanks for the pointers, and thanks for debunking the notion that MSN is crawling via scraping Google's index somehow. I saw an email that someone wrote to us, and it didn't sound like something MSN would do. Glad to hear it from the source though. :)"
So what does leading industry expert, Danny Sullivan think? In his post in the the SEW thread named Microsoft Scraping Google and Yahoo! SERPS?, he also debunks this rumor. I like every word of his post, so I will quote it now.
There are plenty of software packages that will screen scape search results in order to create search fodder for those trying to generate AdSense or other traffic.
It's entirely possible that MSN has simply crawled one of these pages. So yes, it would have crawled Google search results -- but these could have been Google search results that were copied and transferred to a different site.
That's far more likely than the idea that MSN is somehow scraping Google. I mean what, MSN starts jumping over to Google, entering site:someonessite.com commands for upteen million sites to do some guesswork on harvesting sites? Farfetched. Much more likely it ran across the results as I've described.
The actual story is also just incorrect. MSN never required a fee to be spidered. MSN still, on the flagship site, partners with Yahoo for its search results. Yahoo has operated a paid inclusion program but as many will attest, has also spidered pages for free aside from this. MSN dropped paid inclusion pages back in July -- but despite this, they already were and still are crawling the web for free via Yahoo (and via themselves, on the beta site).
And the fastest way to get relevant pages is to crawl Google for every page listed from a site? Not. You'd instead do what the other crawlers do, harvest links from across the web and start indexing the ones you see most often.
Of course I am not angry at my buddy Jason, this is a simple blog to blog debate now. :)