Yahoo's Crawler Not Listening To Robots.txt Directive?

Sep 19, 2011 • 9:04 am | comments (3) by twitter Google+ | Filed Under Yahoo Search Engine Optimization
 

Yahoo SlurpA WebmasterWorld thread reports that Yahoo may not be fully listening to the robots.txt directive to block their spider, Yahoo Slurp.

The thing is, Yahoo spider isn't all that active these days - because Bing is now powering much of Yahoo and thus BingBot is most active.

The webmaster said:

Depending on the Host and UA, the official Yahoo! Slurp apparently does whatever it wants to. Note the subtle differences in the subdomains and UAs...

This morning, the only Host to read/heed robots.txt was:

b3091154.crawl.yahoo.net [67.195.112.189] Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

These retrieved graphics by the pageful, over 60 total:

b5101137.yst.yahoo.net [98.137.72.218] b5101139.yst.yahoo.net [98.137.72.228] Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

I am not sure if this is a widespread issue or something that is just a smaller bug.

The main question is, should you care of Yahoo is crawling your site when Bing is? That discussion is also taking place in the forum thread. The answer is, it depends.

Forum discussion at WebmasterWorld.

Previous story: Bing Uses User Search History To Adapt Your Search Results
 
blog comments powered by Disqus