It seems that a lot of well-known companies have webmasters (or legal departments) who just don't have a clue how to implement a robots.txt file. According to a DigitalPoint Forums thread, the United Kingdom based Daily Telegraph is looking to sue Google and Yahoo for accessing its content.
Their statement, as quoted in the Guardian Unlimited, is that they are concerned that these search engines are accessing content for free and don't give them proper credit.
Our ability to protect content is under consistent attack from those such as Google and Yahoo who wish to access it for free. These companies are seeking to build a business model on the back of our own investment without recognition. All media companies need to be on guard for this. Success in the digital age, as we have seen in our own company, is going to require massive investment... [this needs] effective legal protection for our content, in such a way that allows us to invest for the future.
Apparently, they're clueless about implementing a robots.txt file that will prevent search engines from accessing content "for free." As of this writing, this is its current robots.txt file:
# Robots.txt file # All robots will spider the domainUser-agent: *
Disallow: */ixale/
Not only that, but they have the ability to remove content from the SERPs in Google and in Yahoo.
It is a bit disturbing how many people are concerned about search engines (which ultimately give them more visibility!) The claim that search engines don't respect their rules goes both ways. Daily Telegraph, I imagine you have rules you want Google and Yahoo to respect. Well, the search engines have rules too. Follow them and you'll be fine.
Feel free to add your two cents on the DigitalPoint Forums thread.
Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!
Tamar Weinberg in Google Search Engine at April 25, 2007 11:20 AM
Comments (6)

Comments
Wow. This is wrong on so many levels. Not only do they have control over robots with robots.txt, but they should publish how much traffic they get from Google and Yahoo. My guess is they know exactly how much traffic they get, and they know all about using robots.txt. They want the visibility of Google and Yahoo without the indexing of the search engines (alleged copyright infringement). Uhhh...you can't have it both ways.
Posted by Brian Rants at April 25, 2007 13:19