We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...
We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...
A WebmasterWorld thread has an advertiser complaining that Google's AdWords spider appears to be lowercasing the destination URLs they have. The thing is, the lowercase URLs for this webmaster don't work with the site and they don't have the time...
Last December, we reported that MSNBot was failing a reverse DNS lookup. Well, guess what folks - MSNBot is failing again on some IP addresses. An updated WebmasterWorld thread brought this to my attention and I verified it myself. Here...
The Live Search Blog announced several updated to their crawler. The first is a name change to reflect the upgrade, previously named msnbot/1.0, it is now named msnbot/1.1. The bulk of the changes include the HTTP Compression and Conditional Get...
Here is an unusual scenario for you. You have a web page, the web page is one page of many on your site. This specific page does not allow search engines to crawl them by using a robots.txt file to...
Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...
A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...
I noticed an update to the WebmasterWorld thread with the discussion of the weird referrals in the form of spam-like referrals coming from Live Search as cloaking tests. It appears a webmaster is now noticing a bot named MSLIVSOP serving...
Earlier this week, we reported that Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors. In short, Ask.com's crawlers were crawling badly formed URLs, causing tons of 404 errors in web server log files. Vivek Pathak, Ask.com's Infrastructure Product...
A WebmasterWorld thread is reporting several webmasters noticing that Ask.com's crawler has recently been generating tons of 404 (file not found) errors on their sites. The issue appears to stem from Ask.com auto inserting URL-Encoded spaces into the URL. URL-encoded...
A year ago, Microsoft promised to enable Webmasters a method of verifying MSNbot. Way too often, rogue spiders mask themselves as official spiders from Google, Yahoo, Live Search or Ask.com. The search engines have enabled methods to conduct reverse DNS...
Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...
A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...
A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...
Yesterday afternoon, Yahoo! announced support for a new attribute that Webmasters and SEOs can use on their pages to help aid the search spiders determine what content is the most important content on the page, by excluding extraneous or irrelevant...
An Adam Lasnik post in Google Groups sprung a post at Cre8asite Forums explaining that if you have bad HTML, Google will be OK with it. Yes, that is the case, your code does not need to be 100% validated...
We all know about PPC fraud and that some of the fraud is caused by bots (robots) that click on the ads and drive up your bill and unwanted traffic. But it gets more serious than that. Bot are also...
Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...
A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...
How cute, seriously, MSN has finally given names to their baby crawlers. You know, Google names their crawlers, i.e. GoogleBot, MediaBot, etc... Yahoo has Slurp, etc. Now MSN has named their crawlers. The MSN Shopping bot is msnbot-products. The MSN...
To subscribe to the Search Engine Roundtable, click here