Google: Can't Crawl Your Robots.txt Then We Stop Crawling Your Site

Jan 3, 2014 • 7:49 am | comments (11) by twitter Google+ | Filed Under Google Search Engine Optimization
 

robots txtDid you know that if Google cannot crawl your robots.txt file, that it will stop crawling your whole site?

This doesn't mean you need to have a robots.txt file, you can simply not have one. But if you do have one and Google knows you do and it cannot access it, then Google will stop crawling your site.

Google's Eric Kuan said this in a Google Webmaster Help thread. He wrote:

If Google is having trouble crawling your robots.txt file, it will stop crawling the rest of your site to prevent it from crawling pages that have been blocked by the robots.txt file. If this isn't happening frequently, then it's probably a one off issue you won't need to worry about. If it's happening frequently or if you're worried, you should consider contacting your hosting or service provider to see if they encountered any issues on the date that you saw the crawl error.

This also doesn't mean you can't block your robots.txt from showing up in the search results, you can. But be careful with that.

In short, if your robots.txt file doesn't return either a 200 or 404 response code, then you got an issue.

Forum discussion at Google Webmaster Help.

Previous story: Daily Search Forum Recap: January 2, 2014
 

Comments:

Gridlock

01/03/2014 01:33 pm

A request for robots.txt must either return a 200 or 404, or Google won't crawl, correct.

Guest

01/03/2014 02:37 pm

For those unaware, Wordpress creates its own robots.txt file if one isn't added manually which I've found has caused problems as it isn't correctly set up. I would recommend uploading a very basic one if you haven't already or check to ensure your site is being crawled correctly.

Fedor

01/03/2014 05:38 pm

There's no reason to even have one unless your site is really messed up. Blank robots.txt ftw.

Joe Williams

01/03/2014 11:10 pm

Seen that a few times - it's hardcore! Specifically, when you get a 500 internal server error on your robots.txt - you're in trouble deep! Seen big G de-index 1000+ pages of a site in a few days.

Joe Williams

01/03/2014 11:14 pm

I've seen the problem happen when you don't have a robots.txt but there is a site-wide 500 server error. That's super serious - and playing it safe it's worth setting up a ping response check (like pingdom) on your robots.txt as well as your homepage.

Fedor

01/04/2014 01:08 am

If you're concerned about your sites uptime or coding issues then you have bigger problems than the robots file. It's probably time for a new host or a dedicated server.

Ehtesham Shaikh

01/06/2014 06:42 am

What about 503 error will crawler come back to the site to crawl again? And when will the crawler come back for the indexation of the site?

Soni Sharma

01/06/2014 07:55 am

Yes many hosts act as 500 response code if robots.txt file not uploaded. So you need to make it 404 or 200 found. If you have recently changed robots.txt file it will take upto 48 hrs to crawl new robots.txt file.

Mike Lowry

01/06/2014 11:45 am

You can test your Webmaster file from Webmaster and find out the errors if your robots file have.

Andy

01/10/2014 11:10 am

I think the point with Pingdom is that it will notify you if your robots file starts erroring so you can fix it before Google deindexes the site.

Justin Spencer

01/14/2014 11:50 am

On Dec,26th and Dec 28th 2013, We got notification related to inability of Googlebot to crawl our website. Finally we noticed that , even after the presence of robots.txt file, Google bot tried to access those pages which were disallowed. This resulted into Error 404 pages for our 4300 and more pages which we disallowed in robots. txt. We also experienced a massive drop in our traffic. What to do?

blog comments powered by Disqus