Google's 2006 New Year Graphic; "Weird"? | Main | Google Search Syntax Tidbit

Craigslist Blocks Most Spiders: Millions of Pages Delisted

A thread started at our forums named Craigslist Delists Millions of Pages from Search Engine Indexes uncovers the new robots.txt file in place over at Craigslist. It basically reads;

############################## # Exclude robots from these

User-agent: YahooFeedSeeker
Disallow: /forums
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj

User-agent: *
Disallow: /cgi-bin
Disallow: /cgi-secure
Disallow: /forums
Disallow: /search
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj


#####################################

They supposedly had millions, 3.6 Million to be exact, of pages indexed at Google and millions at the other search engines. Now? 211,000 at Google, 280,000 at Yahoo and 4,695 atMSN.

Forum discussion at Search Engine Roundtable Forums.



Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!

posted rustybrick in Search Technology at January 3, 2006 8:10 AM Comments (0)

Post a comment (Note: Can Take 120 Seconds For Your Comment To Show Up)

Do you want us to save your personal Information?

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here