Over the past few weeks, I have been noticing threads pop up in the Bing forums with complaints from webmasters that Bing's bot, aka MSNBot is not honoring their robots.txt directives. It was not just one thread, but at least...
Over the past few weeks, I have been noticing threads pop up in the Bing forums with complaints from webmasters that Bing's bot, aka MSNBot is not honoring their robots.txt directives. It was not just one thread, but at least...
JohnMu from Google posted in a Google Webmaster Help thread that Google typically crawls a site's robots.txt file on a daily basis. This is the first time (at least that I can remember) I have seen a Googler make a...
A HighRankings Forum thread asks why do some people use more than a single robots.txt file to control and instruct search spiders how to crawl and access their content. That is a good question. Typically, the spiders will only listen...
For many many cases, when people are having issues with their site's doing well in the search engines - they sometimes overlook the obvious. I cannot tell you how many times I have seen or heard that webmasters are complaining...
It seems like we have confirmed reports from a Googler in Google Groups that Google's video crawler, part of the GoogleBot family, is not playing nice. In short, even though you may be telling Google not to crawl your videos,...
I found an interesting tidbit while reading a somewhat detailed thread at Google Groups. The scenario is as follows. You have blocked Googlebot from accessing your site for a 6 month period or so. Then you want to welcome Googlebot...
In case you didn't know, you can upgrade your Feedburner URL to feedproxy.google.com. However, some people noticed that feedproxy.google.com is actually disallowing robots.txt -- odd, huh? In fact, the problem has been breaking some feeds. Fortunately, Google has been paying...
A Google Groups thread shows the tail of a webmaster who had issues with his robots.txt file. The robots.txt file was uploaded in what is called byte-order mark (BOM) encoding, which threw off Google, when trying to retrieve and understand...
I found a very interesting tidbit from a Google Groups thread on unreachable robots.txt files. I always believed that a site does not need a robots.txt file. In fact, this site does not have a robots.txt file and yet we...
An interesting discussion is taking place at WebmasterWorld on the topic of the robots.txt file. One webmaster did not want his robots.txt file to be indexed by Google, but has no way of delisting in in Google. The only ways...
Webmasters at Google Groups are reporting that some verified sites' sitemap URLs are not being accepted by Google Webmaster Tools. The error some webmasters are receiving is: "URL not allowed - This url is not allowed for a Sitemap at...
Want an official Google robots.txt generator? You have one. Yesterday, the Google Webmaster Central blog announced the launch of a new tool in Google Webmaster Central, the robots.txt generator. Here's what it looks like: You'll then need to download it...
This past weekend, Andy Beard did what many people thought unthinkable. In his blog post, he says that he's blocked Google from crawling paid reviews on his site. His reasoning is clear: I have spent a long time deciding on...
Yahoo recently announced that they are supporting four new types of exclusion tags in the robots.txt file: NOINDEX, NOARCHIVE, NOSNIPPET, and NOFOLLOW. The benefits of being able to declare these directives in the robots.txt file enables folks who store PDFs,...
Yesterday the Automated Content Access Protocol group released their ACAP Technical Framework (Extension of robots.txt format PDF). Danny does an excellent job explaining the implications for search engines with his ACAP Launches, Robots.txt 2.0 For Blocking Search Engines? In short,...
Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...
Guess what? If you thought your robots.txt file wasn't being spidered by search engines, think again. Try the following in Google: "robots.txt" "disallow:" filetype:txt I'm sure you'll find a significant number of results. Are you concerned about what content is...
A number of WebmasterWorld members who use Google's Blogspot application are reporting that when they log into Google Webmaster Tools, they're getting a lot of errors that many URLs are being restricted by robots.txt. However, blogspot.com users have no control...
In July, I wrote asking for ideas on how to prevent "View as HTML" links from appearing on PDF files. In other words, authors of PDF files don't want them to be cached. A DigitalPoint Forums member seems to have...
A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...
A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...
As Barry reported on Search Engine Land and as the official Google Webmaster Central blog announced yesterday, Google is working to add the functionality to make webmastering a lot easier. With this update, Google announces that it has enhanced their...
As we've discussed before, Google has been planning an unavailable_after tag that would enable webmasters to alert Google to when content should no longer be crawled by the Googlebot. A Google Blog post by Dan Crow announces that this feature...
In a Google Groups thread, a member has set up a subdomain on a domain and has duplicate content on the main domain: in a nutshell, domainB.domainA.com and domainB.com is pointing to the same content To avoid duplicate content issues,...
Danny called out Google for indexing their own product search results on April 30th. Since then Google updated their robots.txt file to include an exclusion of /products?. A German forum, ABAKUS Forum (remember Alan Webb?) spotted Google displaying Google Product...
This past Sunday I launched a new element for locating content at the Search Engine Roundtable, I launched the Search Engine Roundtable Tag Cloud. Ever since we upgraded to the new version of Movable Type, which supported tags, I have...
A few days ago, Danny spotted Google Product Search Results in the regular Google SERPs. The problem with this is that Google has asked others not to do this in the past, but they apparently weren't practicing what they preached....
A WebmasterWorld thread has someone reporting that he noticed a robots.txt file has been indexed and marked in the supplemental index at Google.com It is clear that Google indexes robots.txt files. The thread creator wants to know what is means...
Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...
I am not sure if the title of this post is all that clear. Basically, if you go to the Robots.txt informational page at the Wikipedia and scroll down to the Directives within a page it claims Google has has...
A featured WebmasterWorld thread has examples of issues with Google possibly disobeying the robots.txt file. GoogleGuy and VanessaFox both come in, to offer some guidance on the perceived issue. GoogleGuy first explains that "a more specific directive takes precedence over...
To subscribe to the Search Engine Roundtable, click here