Robots.txt file ignored by Google?

Feb 20, 2006 • 10:49 am | comments (5) by twitter | Filed Under Other Google Topics
 

Apparently the Googlebot is failing to heed the "Keep Away!" that the robots.txt file is supposed to yell authoritatively. Rand mentioned something like this the other day, and critter over at SEW forums described the following when he started a thread asking "What's The Point of A Robots.txt File If Google Ignores It?"

I noticed today Google indexing my images folder, even though I explicity prevent ALL SEARCH ENGINE SPIDERS from indexing that folder from various reasons. I have had this robots.txt file in the root of my site since the day it was launched and am quite annoyed and frustrated with Google for ignoring it and indexing the contents of the folder anyways.

Maybe we know now why Matt Cutts doesn't use one? :p

After assuring one member that the robots.txt was properly formatted, critter gets some further support is his assertion. I will be examining some log files over the next week to see for any such instances, and I'm sure we would all love to see more evidence of this in the thread.

Read about it and post your thoughts at Search Engine Watch Forums.

See Rand’s post at SEOmoz Blog.

Previous story: WebmasterWorld Conference Pass Giveaway
 

Comments:

Brandon

02/20/2006 05:41 pm

Perhaps Google wants to really index EVERYTHING and is saving the robots.txt file as a exclude list for what it serves in the SERPs. More indexing means more knowledge. Brandon -- Inpired Impressions Blog at http://inspiredimpressions.wordpress.com Usability, Internet Marketing and SEO

Neuronimbus Software Services

02/21/2006 06:48 am

Its all about Indexing and Crawling. Google has a new feature in Google Sitemap - Robots.txt. If you upload robots.txt, it means that you allow google to index. We also have done this in meta tags but for now, you require to make robots.txt and google will crawl your everypage. Suggestion - Well, its not necessary to allow everythings in robots.txt. Exclude the pages or directory which you don't want for indexing. Google is now expanding more and more features free of cost to webmasters. Regards, Neuronimbus Software Services Pvt. Ltd.

Telian Adlam

02/21/2006 05:51 pm

Last year, I noted that <a href="http://www.mildinsanity.com/archives/2005/04/07/something-disturbing/">some Google sites had indexed the images</a> on my personal blog and was, in fact, returning those images in search results. Upon contacting Google, I received an e-mail from them denying that my site's image folder had been crawled or indexed. I did a subsequent search and all of my images were gone. (Wish I had snapped some screenshots, but hindsight is 20/20.) Even with a few of my current websites, though my robots.txt file explicitly denies access to certain folders/files, they remain indexed in Google. I don't believe Google is deliberately trying to be evil, but I have to echo the sentiments of that thread - what <em>is</em> the point of having a robots.txt file if Googlebot ignores it? Telian Adlam <a href="http://www.buniek.com/">Destination: Success</a>

pk_synths

02/21/2006 10:55 pm

There is a new googlebot that's going around using Mozilla as the user-agent and NOT googlebot. The new bot doesn't seem to follow the traditional spider rules. This could be why it's ignoring robots.txt. -PK

San

02/25/2008 11:45 pm

I agree something at Google is ignoring the robots.txt file, but Google are too big to care about the law, caching, PDF stripping and YouTube are other examples where Google doesn't care about your copyrights. LOL.

blog comments powered by Disqus