An interesting discussion is taking place at WebmasterWorld on the topic of the robots.txt file. One webmaster did not want his robots.txt file to be indexed by Google, but has no way of delisting in in Google. The only ways...
An interesting discussion is taking place at WebmasterWorld on the topic of the robots.txt file. One webmaster did not want his robots.txt file to be indexed by Google, but has no way of delisting in in Google. The only ways...
Webmasters at Google Groups are reporting that some verified sites' sitemap URLs are not being accepted by Google Webmaster Tools. The error some webmasters are receiving is: "URL not allowed - This url is not allowed for a Sitemap at...
Want an official Google robots.txt generator? You have one. Yesterday, the Google Webmaster Central blog announced the launch of a new tool in Google Webmaster Central, the robots.txt generator. Here's what it looks like: You'll then need to download it...
This past weekend, Andy Beard did what many people thought unthinkable. In his blog post, he says that he's blocked Google from crawling paid reviews on his site. His reasoning is clear: I have spent a long time deciding on...
Yahoo recently announced that they are supporting four new types of exclusion tags in the robots.txt file: NOINDEX, NOARCHIVE, NOSNIPPET, and NOFOLLOW. The benefits of being able to declare these directives in the robots.txt file enables folks who store PDFs,...
Yesterday the Automated Content Access Protocol group released their ACAP Technical Framework (Extension of robots.txt format PDF). Danny does an excellent job explaining the implications for search engines with his ACAP Launches, Robots.txt 2.0 For Blocking Search Engines? In short,...
Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...
Guess what? If you thought your robots.txt file wasn't being spidered by search engines, think again. Try the following in Google: "robots.txt" "disallow:" filetype:txt I'm sure you'll find a significant number of results. Are you concerned about what content is...
A number of WebmasterWorld members who use Google's Blogspot application are reporting that when they log into Google Webmaster Tools, they're getting a lot of errors that many URLs are being restricted by robots.txt. However, blogspot.com users have no control...
In July, I wrote asking for ideas on how to prevent "View as HTML" links from appearing on PDF files. In other words, authors of PDF files don't want them to be cached. A DigitalPoint Forums member seems to have...
A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...
A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...
As Barry reported on Search Engine Land and as the official Google Webmaster Central blog announced yesterday, Google is working to add the functionality to make webmastering a lot easier. With this update, Google announces that it has enhanced their...
As we've discussed before, Google has been planning an unavailable_after tag that would enable webmasters to alert Google to when content should no longer be crawled by the Googlebot. A Google Blog post by Dan Crow announces that this feature...
In a Google Groups thread, a member has set up a subdomain on a domain and has duplicate content on the main domain: in a nutshell, domainB.domainA.com and domainB.com is pointing to the same content To avoid duplicate content issues,...
Danny called out Google for indexing their own product search results on April 30th. Since then Google updated their robots.txt file to include an exclusion of /products?. A German forum, ABAKUS Forum (remember Alan Webb?) spotted Google displaying Google Product...
This past Sunday I launched a new element for locating content at the Search Engine Roundtable, I launched the Search Engine Roundtable Tag Cloud. Ever since we upgraded to the new version of Movable Type, which supported tags, I have...
A few days ago, Danny spotted Google Product Search Results in the regular Google SERPs. The problem with this is that Google has asked others not to do this in the past, but they apparently weren't practicing what they preached....
A WebmasterWorld thread has someone reporting that he noticed a robots.txt file has been indexed and marked in the supplemental index at Google.com It is clear that Google indexes robots.txt files. The thread creator wants to know what is means...
Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...
I am not sure if the title of this post is all that clear. Basically, if you go to the Robots.txt informational page at the Wikipedia and scroll down to the Directives within a page it claims Google has has...
A featured WebmasterWorld thread has examples of issues with Google possibly disobeying the robots.txt file. GoogleGuy and VanessaFox both come in, to offer some guidance on the perceived issue. GoogleGuy first explains that "a more specific directive takes precedence over...
To subscribe to the Search Engine Roundtable, click here