Google: Remove The Robots.txt File Completely

Jan 6, 2011 • 8:31 am | comments (24) by twitter Google+ | Filed Under Spiders
 

googlebotBelieve it or not, I am not a huge fan of placing robots.txt files on sites unless you want to specifically block content and sections from Google or other search engines. It just always felt redundant to tell a search engine they can crawl your site when they will do so unless you tell them not to.

Google's JohnMu confirmed in a Google Webmaster Help thread and even recommended to one webmaster that he/she should remove their robots.txt file "completely."

John said:

I would recommend going even a bit further, and perhaps removing the robots.txt file completely. The general idea behind blocking some of those pages from crawling is to prevent them from being indexed. However, that's not really necessary -- websites can still be crawled, indexed and ranked fine with pages like their terms of service or shipping information indexed (sometimes that's even useful to the user :-)).

I know many SEOs feel it is mandatory to have a robots.txt file and just have it say, User-agent: * Allow: /. Why when they will eat up your content anyway?

Anyway, it is nice to see a Googler confirming this, at least in this case.

Forum discussion at Google Webmaster Help.

Previous story: Google Certification Tests Not Showing For Some Advertisers
 

Comments:

Moosa Hemani

01/06/2011 01:43 pm

Jesus! I was doing this, but every single word of it is very logical. most of the website i have for clients have robot.txt file added and it says the same User-agent: * Allow: / now em looking at the computer and talking to myself how foolish you are Hemani :p Thanks for the post! This is really going to help.

Peter Handley

01/06/2011 02:02 pm

I'd use it almost empty for the sitemap.xml reference at times as well as crawling prevention, so, so long as you have a sitemap to reference, even an "empty" robots.txt file can have something useful in it. I would just allow the lot though

Brian R. Brown

01/06/2011 02:23 pm

This doesn't take into account then that 404 reporting may be skewed by the lack of a robots.txt, which most engines, Google included, will request by default. I'd prefer not to use the "allow" directive though, rather an empty "disallow" which, yes, seems silly and confusing, but should be better understood and more universally accepted by all search engines where "allow" may not -- granted, the engines that truly matter should interpret either directive correctly.

Adnan Anatolian

01/06/2011 04:17 pm

Mine is strangely removed today. I did not remove it. Just realized it in Webmasters. After I read that, I felt much more comfortable. :)

Michael Martinez

01/06/2011 05:19 pm

"I know many SEOs feel it is mandatory to have a robots.txt file and just have it say, User-agent: * Allow: /. Why when they will eat up your content anyway?" Because on those occasions when you want to add some directives, it's often easier to log in and edit an existing file than to upload a new one. Not every Website is hosted the same way.

gabs

01/06/2011 05:45 pm

404 errors was the main reason to have one in my book but hey what do I know :)

Aashish Sahrawat

01/07/2011 05:12 am

Come on we should not go to wrong way. Robots.txt file is still in importance because we need it for disallow pages and using removal tool. Johnmu was talking that if there are not such pages which you do not like to disallow in that case you can remove robots.txt

Keerthi

01/07/2011 07:17 am

If i need exclude certain pages in the search results, what i will do. Will google automatically identify that has been unwanted results?

Miles Carter

01/07/2011 10:05 am

If you ask me, XML sitemaps on established sites can be like this too.

Aashish Sahrawat

01/07/2011 11:32 am

keerthi your concern is right, that's why robots will not be out use. This post is just example of link baiting.

Gudipudi

01/10/2011 05:38 am

i still believe robots file is mandatory when you want to block certain folders. After all google can not decide which part of my content is important. I would better know my audience.

LaurentB

01/10/2011 08:47 am

Do you check your server logs ? Most likely tons of 404 errors because of spiders looking for the robots.txt. Really tons of them.

Aashish Sahrawat

01/11/2011 06:19 am

That's why i telling that we need robots.txt, this post is just creating traffic, posting such diplomatic post. So, should put your robtos.txt file on server.

LaurentB

01/11/2011 06:31 am

Of course. Robots.txt is the first file a spider is looking for when accessing a website. John Mu from Google Webmaster Help forums might not be very educated about server issues. When I see logs full of 404 because spiders can't find a robots.txt, no one can tell me it's good. Our job as SEO is to ease the path for spiders. First step is to put a robots.txt (even an empty one). I would love to see server logs of Barry's site for instance ;)

Daniel Chavez Moran

02/01/2011 05:58 pm

I agree, especially for smaller sites.

Sean

04/23/2011 11:11 pm

And so allowing Google to hammer your site looking for the 9,000,000,000 names of God is a good thing instead of only letting it have access to the 90,000 canonical URLs listed in your siteindex? So many wasted clock processor cycles that would be best used giving your customers their pages.

DPW

04/20/2012 03:25 pm

Well, you can have it one of two ways - logs full of 404 for robots.txt, or logs full of 200 for robots.txt.  And with the 404, you don't lose the (miniscule) bandwidth it takes to transfer the text file. Plus, if a server is so anemic that it can't handle sending 404 requests for a missing text file, it needs to be completely rebuilt.

cheryl

01/08/2013 03:49 am

I have a robot.txt disallow and I cannot find it on my site. Where else could this file be? Could someone list my url in a directory with the robot.txt as part of the url?

Rebecca

08/13/2013 02:19 am

I need help. I need to allow a robot.txt file to go to my occupations benefits website. Any suggestions? The webpage that keeps being blocked is https://www.webmdhealth.com/starbucks because apparently it has a robot file.

Ramesh Kondeti

09/24/2013 09:05 am

I have removed robots.txt from my site. still showing blocked URLs.. Http://www.krwebsolutions.in

tom

10/17/2013 09:20 pm

where should I add the used agent:*/Allow:

Webmaster Sun

02/04/2014 02:21 pm

I remove block by robots for my blog http://www.webmastersun.com/blog/ but still wait for updates from Google for some days. How to increase this process faster?

srivathsaa

04/03/2014 04:34 am

I have removed the disallow function but I dont know who it will effect my site www.freshersjobopenings.com

Micheal

04/13/2014 04:56 pm

404 Responses are not a bad thing. They are a request that simply says "Nope, the file is not here". A 200 response, like DPW points out actually requires *more* from your server. I post this mainly as there are still people visiting this article (and commenting) every so often - 404's are nothing to be afraid of (unless you actually intended to have some form of content there).

blog comments powered by Disqus