Yahoo! Slurp on the Loose?

Jan 17, 2007 • 7:57 am | comments (5) by twitter Google+ | Filed Under Yahoo Search Engine Optimization
 

A WebmasterWorld and Search Engine Watch Forums threads are both reporting issues with Yahoo! Slurp (Yahoo!'s Crawler) indexing pages they should not be, and in quantities that may be harmful.

It appears that only specific bots are not obeying the robots.txt file and indexing pages are rates that can potentially cause server issues.

The specific IP addresses appear to be in the 74.6.x block. They do reverse DNS to inktomi, which is correct.

Forum discussion at WebmasterWorld and Search Engine Watch Forums.

Previous story: Wikipedia Search Engine Wikiseek
 

Comments:

Mike

01/17/2007 02:44 pm

My blog is the Maytag man of the blogosphere. Yet Yahoo! (Inktomi) crawls it every day faithfully. Just me and Inktomi but I am thankful they notice, LOL.

Adam Audette

01/17/2007 04:31 pm

This was discussed on the LED Digest last week - the original post is in #2321: http://www.led-digest.com/content/view/1701/55/ with responses in the next 3-4 issues. As far as I know the OP never resolved this, but he did offer a piece of advice: "Feature request for SE spiders: Provide a referrer. Please. It would make me and I expect other site owners feel grateful when odd URL requests are noticed. If more than one referrer, then just any one -- the last one, the first one, doesn't matter which. Referrer information could save people a lot of time, and let them keep their hair a while longer." Hope this info helps...

Barry Schwartz

01/17/2007 04:39 pm

Thanks Adam, sorry for missing it.

Tim

01/17/2007 07:01 pm

I answered the specific question on webmasterworld. It does not seem that there is an issue with the crawler in this instance but an incorrect interpreatation of the robots.txt syntax by the publisher. Tim

Ram

01/18/2007 05:59 am

Is Yahoo's bot based on WGet? I get this doubt because there was the following line in my log file 2007-01-17 17:44:24 W3SVC105 NT-110 XX.XX.XX.XX GET / - 80 - 66.228.165.49 HTTP/1.0 Wget/1.8.2 - - www.mydomain.com 200 0 0 11155 111 578 The IP Reverse DNSes to i18ndev23.yst.corp.yahoo.com.

blog comments powered by Disqus