Entries from Search Engine Roundtable tagged with 'robots.txt'

Should Google Not Index Robots.txt Files in Search Results?

An interesting discussion is taking place at WebmasterWorld on the topic of the robots.txt file. One webmaster did not want his robots.txt file to be indexed by Google, but has no way of delisting in in Google. The only ways...

Google Recommends robots.txt Sitemap Submission During Glitch

Webmasters at Google Groups are reporting that some verified sites' sitemap URLs are not being accepted by Google Webmaster Tools. The error some webmasters are receiving is: "URL not allowed - This url is not allowed for a Sitemap at...

Google Adds Robots.txt Generator to Webmaster Tools

Want an official Google robots.txt generator? You have one. Yesterday, the Google Webmaster Central blog announced the launch of a new tool in Google Webmaster Central, the robots.txt generator. Here's what it looks like: You'll then need to download it...

Andy Beard Blocks Googlebot with Robots.txt

This past weekend, Andy Beard did what many people thought unthinkable. In his blog post, he says that he's blocked Google from crawling paid reviews on his site. His reasoning is clear: I have spent a long time deciding on...

Yahoo Expands X-Robots-Tag: Supports NOINDEX, NOARCHIVE, NOSNIPPET, and NOFOLLOW

Yahoo recently announced that they are supporting four new types of exclusion tags in the robots.txt file: NOINDEX, NOARCHIVE, NOSNIPPET, and NOFOLLOW. The benefits of being able to declare these directives in the robots.txt file enables folks who store PDFs,...

Will Webmasters Adopt The New Robots.txt Proposal (Automated Content Access Protocol)?

Yesterday the Automated Content Access Protocol group released their ACAP Technical Framework (Extension of robots.txt format PDF). Danny does an excellent job explaining the implications for search engines with his ACAP Launches, Robots.txt 2.0 For Blocking Search Engines? In short,...

Double Check Your Robots.txt: Google Testing New Crawler Directives

Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...

Google Can Find Your Robots.txt File

Guess what? If you thought your robots.txt file wasn't being spidered by search engines, think again. Try the following in Google: "robots.txt" "disallow:" filetype:txt I'm sure you'll find a significant number of results. Are you concerned about what content is...

Blogger.com Users Report Abnormal Google Webmaster Tools Activity with Robots.txt

A number of WebmasterWorld members who use Google's Blogspot application are reporting that when they log into Google Webmaster Tools, they're getting a lot of errors that many URLs are being restricted by robots.txt. However, blogspot.com users have no control...

Preventing Google from Caching PDF Files

In July, I wrote asking for ideas on how to prevent "View as HTML" links from appearing on PDF files. In other words, authors of PDF files don't want them to be cached. A DigitalPoint Forums member seems to have...

Is a Robots.txt File Required for Search Engine Optimization?

A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...

Managing the Robots.txt File for Sites Sharing Same Local Files

A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...

Google Webmaster Central Improves robots.txt Tool

As Barry reported on Search Engine Land and as the official Google Webmaster Central blog announced yesterday, Google is working to add the functionality to make webmastering a lot easier. With this update, Google announces that it has enhanced their...

Google Expands Robots Exclusion Protocol, Unavailable After Tag Now Live

As we've discussed before, Google has been planning an unavailable_after tag that would enable webmasters to alert Google to when content should no longer be crawled by the Googlebot. A Google Blog post by Dan Crow announces that this feature...

How Does robots.txt Behave on Domains and Subdomains?

In a Google Groups thread, a member has set up a subdomain on a domain and has duplicate content on the main domain: in a nutshell, domainB.domainA.com and domainB.com is pointing to the same content To avoid duplicate content issues,...

Google Still Displaying Their Own Search Results in Google.com

Danny called out Google for indexing their own product search results on April 30th. Since then Google updated their robots.txt file to include an exclusion of /products?. A German forum, ABAKUS Forum (remember Alan Webb?) spotted Google displaying Google Product...

New Tag Cloud at Search Engine Roundtable: Should Tag Clouds be Blocked from Search Engines?

This past Sunday I launched a new element for locating content at the Search Engine Roundtable, I launched the Search Engine Roundtable Tag Cloud. Ever since we upgraded to the new version of Movable Type, which supported tags, I have...

Google Product Search Results in Google: An Oversight

A few days ago, Danny spotted Google Product Search Results in the regular Google SERPs. The problem with this is that Google has asked others not to do this in the past, but they apparently weren't practicing what they preached....

A Robots.txt File Marked As Supplemental By Google?

A WebmasterWorld thread has someone reporting that he noticed a robots.txt file has been indexed and marked in the supplemental index at Google.com It is clear that Google indexes robots.txt files. The thread creator wants to know what is means...

Yahoo! Slurping Wildcards Via Robots.txt File

Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...

Wikipedia Claims Google Has Content Targeting like Tags for Crawling?

I am not sure if the title of this post is all that clear. Basically, if you go to the Robots.txt informational page at the Wikipedia and scroll down to the Directives within a page it claims Google has has...

GoogleGuys Explains Robots.txt Handling

A featured WebmasterWorld thread has examples of issues with Google possibly disobeying the robots.txt file. GoogleGuy and VanessaFox both come in, to offer some guidance on the perceived issue. GoogleGuy first explains that "a more specific directive takes precedence over...


To subscribe to the Search Engine Roundtable, click here