Google Sitemaps Robots.txt Validator Does Not Properly Validate

Mar 23, 2006 - 8:00 am 1 by

Shawn Hogan of DigitalPoint wrote a blog entry named Google Not Interpreting robots.txt Consistently. He describes how he noticed that some of his pages were being crawled by GoogleBot, even though his robots.txt file specifically was blocking it. So he emailed Google, and they actually replied with the following message;

hile we normally don't review individual sites, we did examine your robots.txt file. Please be advised that it appears your Googlebot entry in your robots.txt file is overriding your generic User-Agent listing. We suggest you alter your robots.txt file by duplicating the forbidden paths under your Googlebot entry: User-agent: * Disallow: /tools/suggestion/? Disallow: /search.php Disallow: /go.php Disallow: /~shawn/scripts/ Disallow: /ads/

User-agent: Googlebot Disallow: /~shawn/ebay_ Disallow: /tools/suggestion/? Disallow: /search.php Disallow: /go.php Disallow: /~shawn/scripts/ Disallow: /ads/

Once you've altered your robots.txt file, Google will find it automatically after we next crawl your site.

Fine, so Shawn can easily do that. It is not a major deal, a bug Google knows about in its robots.txt protocol. But what Shawn points out is that the Google Sitemaps robots.txt validator shows that his previous robots.txt file;

User-agent: *
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /~shawn/scripts/
Disallow: /ads/
User-agent: Googlebot
Disallow: /~shawn/ebay_

was actually validated that it would not crawl the /ads/ directory. The two are not consistent, and should be, obviously.

Forum discussion at DigitalPoint Forums.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Gvolatility, Bing Generative Search, Reddit Blocks Bing, Sticky Cookies, AI Overview Ads & SearchGPT - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: July 26, 2024

Jul 26, 2024 - 10:00 am
Search Video Recaps

Google Volatility, Bing Generative Search, Reddit Blocks Bing, Sticky Cookies, AI Overview Ads & SearchGPT

Jul 26, 2024 - 8:01 am
Google

Google Gemini Adds Related Content & Verification Links

Jul 26, 2024 - 7:51 am
Other Search Engines

SearchGPT - OpenAI's AI Search Tool

Jul 26, 2024 - 7:41 am
Search Engine Optimization

Google's John Mueller: Don't Use LLMs For SEO Advice

Jul 26, 2024 - 7:31 am
Google

Google Search With Related Images Carousel Below Image Box

Jul 26, 2024 - 7:21 am
Previous Story: Ask.com Search Quality & Search Index Improving?