Will Google Crawl Your Site Without a Robots.txt File? It Depends

Jun 18, 2008 • 7:26 am | comments (4) by twitter Google+ | Filed Under Google Search Engine Optimization
 

I found a very interesting tidbit from a Google Groups thread on unreachable robots.txt files.

I always believed that a site does not need a robots.txt file. In fact, this site does not have a robots.txt file and yet we are very well indexed. Proof that you don't need a robots.txt file to allow Google to index your site. Right?

Well, maybe not. Googler, JohnMu, said in the Google Groups thread that if your robots.txt file is unreachable due to timing out or other issues, not including a 404 not found status, Google "tends not to crawl the site at all just to be safe."

You hear that? Google might not crawl your entire site if it cannot reach your robots.txt file properly.

In the case in the thread, the robots.txt file was unreachable due to a complex set of redirects that made Googlebot very dizzy.

John explains later on that "unparsable robots.txt" files are "generally" okay, since Google is getting back some type of server response. When you have an issue is when generally "the URL is just unreachable (perhaps a "security update" that ended up blocking us in general) or situations like this where we give up trying to access the URL (which in a way is unreachable as well)," said John.

So, for those picky Belgium's, just make your robots.txt file unreachable, and there you go. :)

Forum discussion at Google Groups.

Previous story: Daily Search Forum Recap: June 17, 2008
 
blog comments powered by Disqus