Google: Robots.txt Files Must Be Smaller Than 500KB

Jan 30, 2012 • 8:57 am | comments (11) by twitter Google+ | Filed Under Google Search Engine Optimization
 

GooglebotGoogle's John Mueller reminds webmasters on his Google+ page that Google has a limit of only being able to process up to 500KB of your robots.txt file.

This is an important point, if you have a super heavy robots.txt file, and it is beyond 500KB, then GoogleBot can get confused. If GoogleBot gets confused with your robots.txt it can cause serious issues with your site's health in the Google results.

Google's John Mueller said:

#102 of the things to keep in mind when working on a big website: If you have a giant robots.txt file, remember that Googlebot will only read the first 500kb. If your robots.txt is longer, it can result in a line being truncated in an unwanted way. The simple solution is to limit your robots.txt files to a reasonable size :-).

John links to this Google document on the robots.txt controls for more information.

If you have any questions on Google's robots.txt handeling, John is answering questions on his Google+ page.

Forum discussion at Google+.

Previous story: Did You Get Your Taxes From Google AdSense?
 

Comments:

kaliseo

01/30/2012 05:26 pm

I think if you need 500k lines in robots.txt file... you've a big problem on your website ^^

Mathias

01/30/2012 08:52 pm

Yep as Thierry said, I don't see what you can put in it to reach 500k...

pervezalam

01/31/2012 04:52 am

For a big website where we found many million pages e.g. wikipedia, news website- CNN, Product website- ebay ....etc there can be many type of condition, direction for pages, so in this situation file size can be incresed.

Sunny Ujjawal

01/31/2012 12:17 pm

Then what to do if robots.txt is larger than 500 KB

Barry Schwartz

01/31/2012 12:21 pm

Make multiple.

Sunny Ujjawal

01/31/2012 12:25 pm

Will Google accept robots-2.txt

Barry Schwartz

01/31/2012 12:26 pm

If they are on their root directories, i.e. if it is broken out into subdomains.

Shahul

02/01/2012 08:35 am

Commercial sites does regular updation and the old offers get  invalid which is added to no follow list in the robots. This automatically increases the files size.

Shahul

02/01/2012 08:39 am

Not necessarily. These methods may necessarily be a black hat but not healthy!

Shahul

02/01/2012 08:40 am

 Will consider it, but according to Matt it confuses the bot!

DESROD

07/06/2012 02:08 pm

its just 697 bytes... User-agent: * Disallow: (bad url 1) Disallow: (bad url 2) Disallow: (bad url 3) Disallow: (bad url 4) Disallow: (bad url 5) Disallow: (bad url 6) Disallow: (bad url 7) Disallow: (bad url 8) Disallow: (bad url 9) Disallow: (bad url 10) Disallow: (bad url 11) Disallow: (bad url 12) Disallow: (bad url 13) Disallow: (bad url 14) Disallow: (bad url 15) Disallow: (bad url 16) Disallow: (bad url 17) Disallow: (bad url 18) Disallow: (bad url 19) Disallow: (bad url 20) Allow: (good url 1) Allow: (good url 2) Allow: (good url 3) Allow: (good url 4) Allow: (good url 5) Allow: (good url 6) Allow: (good url 7) Allow: (good url 8) Allow: (good url 9) Allow: (good url 10)

blog comments powered by Disqus