We have more MSNBot troubles to unfortunately bring to you. Microsoft Bing's spider, MSNBot, is apparently not listening to directives they should be listening to. In this case, it is the crawl delay command, where a couple users are claiming...
We have more MSNBot troubles to unfortunately bring to you. Microsoft Bing's spider, MSNBot, is apparently not listening to directives they should be listening to. In this case, it is the crawl delay command, where a couple users are claiming...
Most search spiders have been known to get a bit crawl happy from time to time. But the most complaints over time come from MSNBot which tends to often get out of hand and send their spiders on individuals sites...
With all the on going issues with MSNBot not behaving, I am not too surprised to see more complaints about the little spider. New confirmed reports from Bing Forums shows that MSNBot is hiding itself under the UserAgent of Mozilla/4.0....
Shawn Hogan, DigitalPoint's founder, has posted a thread at DigitalPoint Forums clearly showing his frustration with MSNBot, Microsoft Bing's search crawler. He is upset that the bot is crawling too much, too fast - causing an unnecessary spike in load...
There is a thread I have been watching at the Bing Community where one member said that he had log files that shows MSNBot (Microsoft Bing's crawler) is clicking on Microsoft adCenter search ads, possibly charging him for those clicks....
There are several reports around the web about a new search bot by Microsoft that is causing major issues for web servers. The bot is named adidxbot and the useragent looks like this: adidxbot/1.1 (+http://search.msn.com/msnbot.htm). This bot has been on...
Back in the day, tracking how bots accessed your site was a bit of a crave. Now, you don't hear about it much. The old Google Analytics, aka Urchin, had a section for displaying bot activity on your site. It...
A HighRankings Forum thread asks why do some people use more than a single robots.txt file to control and instruct search spiders how to crawl and access their content. That is a good question. Typically, the spiders will only listen...
incrediBILL, moderator at WebmasterWorld, noticed that one of Live Search's bots was crawling through his JavaScript. The bot is named MSNBOT-MEDIA and he noticed that it was accessing JavaScript files and AJAX functions. He noticed that the bot was triggering...
It seems like some webmasters are becoming fed up with the activity of Yahoo's crawl, Yahoo Slurp, relative to the amount of traffic Yahoo Search is sending the web site. In fact, some webmasters have taken the plunge and banned...
Last night, I had a nice chat with Googler, JohnMu. I joked around with John, asking if he has messed up yet, in terms of Google communication with webmasters. He said not really - which I agree with. But he...
We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...
A WebmasterWorld thread has an advertiser complaining that Google's AdWords spider appears to be lowercasing the destination URLs they have. The thing is, the lowercase URLs for this webmaster don't work with the site and they don't have the time...
Last December, we reported that MSNBot was failing a reverse DNS lookup. Well, guess what folks - MSNBot is failing again on some IP addresses. An updated WebmasterWorld thread brought this to my attention and I verified it myself. Here...
The Live Search Blog announced several updated to their crawler. The first is a name change to reflect the upgrade, previously named msnbot/1.0, it is now named msnbot/1.1. The bulk of the changes include the HTTP Compression and Conditional Get...
Here is an unusual scenario for you. You have a web page, the web page is one page of many on your site. This specific page does not allow search engines to crawl them by using a robots.txt file to...
Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...
A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...
I noticed an update to the WebmasterWorld thread with the discussion of the weird referrals in the form of spam-like referrals coming from Live Search as cloaking tests. It appears a webmaster is now noticing a bot named MSLIVSOP serving...
Earlier this week, we reported that Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors. In short, Ask.com's crawlers were crawling badly formed URLs, causing tons of 404 errors in web server log files. Vivek Pathak, Ask.com's Infrastructure Product...
A WebmasterWorld thread is reporting several webmasters noticing that Ask.com's crawler has recently been generating tons of 404 (file not found) errors on their sites. The issue appears to stem from Ask.com auto inserting URL-Encoded spaces into the URL. URL-encoded...
A year ago, Microsoft promised to enable Webmasters a method of verifying MSNbot. Way too often, rogue spiders mask themselves as official spiders from Google, Yahoo, Live Search or Ask.com. The search engines have enabled methods to conduct reverse DNS...
Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...
A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...
A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...
Yesterday afternoon, Yahoo! announced support for a new attribute that Webmasters and SEOs can use on their pages to help aid the search spiders determine what content is the most important content on the page, by excluding extraneous or irrelevant...
An Adam Lasnik post in Google Groups sprung a post at Cre8asite Forums explaining that if you have bad HTML, Google will be OK with it. Yes, that is the case, your code does not need to be 100% validated...
We all know about PPC fraud and that some of the fraud is caused by bots (robots) that click on the ads and drive up your bill and unwanted traffic. But it gets more serious than that. Bot are also...
Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...
A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...
How cute, seriously, MSN has finally given names to their baby crawlers. You know, Google names their crawlers, i.e. GoogleBot, MediaBot, etc... Yahoo has Slurp, etc. Now MSN has named their crawlers. The MSN Shopping bot is msnbot-products. The MSN...
To subscribe to the Search Engine Roundtable, click here