Entries from Search Engine Roundtable tagged with 'robots'

MSNBot Crawl Delay Doesn't Delay

We have more MSNBot troubles to unfortunately bring to you. Microsoft Bing's spider, MSNBot, is apparently not listening to directives they should be listening to. In this case, it is the crawl delay command, where a couple users are claiming...

Bing's MSNBot Crawl Happy?

Most search spiders have been known to get a bit crawl happy from time to time. But the most complaints over time come from MSNBot which tends to often get out of hand and send their spiders on individuals sites...

Bing Masking MSNBot Under Mozilla's UserAgent & Reverse IP Fails

With all the on going issues with MSNBot not behaving, I am not too surprised to see more complaints about the little spider. New confirmed reports from Bing Forums shows that MSNBot is hiding itself under the UserAgent of Mozilla/4.0....

DigitalPoint Founder Upset With MSNBot's Crawl Rate (MSNBot 2.0b)

Shawn Hogan, DigitalPoint's founder, has posted a thread at DigitalPoint Forums clearly showing his frustration with MSNBot, Microsoft Bing's search crawler. He is upset that the bot is crawling too much, too fast - causing an unnecessary spike in load...

MSNBot Clicking on Bing adCenter Search Ads?

There is a thread I have been watching at the Bing Community where one member said that he had log files that shows MSNBot (Microsoft Bing's crawler) is clicking on Microsoft adCenter search ads, possibly charging him for those clicks....

New MSNBot Named adidxbot Causing Trouble

There are several reports around the web about a new search bot by Microsoft that is causing major issues for web servers. The bot is named adidxbot and the useragent looks like this: adidxbot/1.1 (+http://search.msn.com/msnbot.htm). This bot has been on...

Tracking Search Bot Activity

Back in the day, tracking how bots accessed your site was a bit of a crave. Now, you don't hear about it much. The old Google Analytics, aka Urchin, had a section for displaying bot activity on your site. It...

Multiple Robots.txt Files for Single Domain

A HighRankings Forum thread asks why do some people use more than a single robots.txt file to control and instruct search spiders how to crawl and access their content. That is a good question. Typically, the spiders will only listen...

Live Search Begins Crawling JavaScript with MSNBot-Media

incrediBILL, moderator at WebmasterWorld, noticed that one of Live Search's bots was crawling through his JavaScript. The bot is named MSNBOT-MEDIA and he noticed that it was accessing JavaScript files and AJAX functions. He noticed that the bot was triggering...

Some Webmasters Are Banning Yahoo Slurp

It seems like some webmasters are becoming fed up with the activity of Yahoo's crawl, Yahoo Slurp, relative to the amount of traffic Yahoo Search is sending the web site. In fact, some webmasters have taken the plunge and banned...

Can Google's First Click Free Program be Used for Web Search?

Last night, I had a nice chat with Googler, JohnMu. I joked around with John, asking if he has messed up yet, in terms of Google communication with webmasters. He said not really - which I agree with. But he...

Google Now Crawling Content Behind Forms

We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...

Is Google's AdWords Spider Lowercasing Destination URLs?

A WebmasterWorld thread has an advertiser complaining that Google's AdWords spider appears to be lowercasing the destination URLs they have. The thing is, the lowercase URLs for this webmaster don't work with the site and they don't have the time...

MSNBot Again Failing Reverse DNS Test

Last December, we reported that MSNBot was failing a reverse DNS lookup. Well, guess what folks - MSNBot is failing again on some IP addresses. An updated WebmasterWorld thread brought this to my attention and I verified it myself. Here...

Microsoft Live Search Adds HTTP Compression & Conditional Gets Support to Crawler

The Live Search Blog announced several updated to their crawler. The first is a name change to reflect the upgrade, previously named msnbot/1.0, it is now named msnbot/1.1. The bulk of the changes include the HTTP Compression and Conditional Get...

Can You Hide Text & Links From Your Users, If Search Engines Won't See?

Here is an unusual scenario for you. You have a web page, the web page is one page of many on your site. This specific page does not allow search engines to crawl them by using a robots.txt file to...

Google's Remove URLs Feature Removes For 90 Days Only

Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...

Wordpress Installation Now Blocking Search Engines?

A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...

Microsoft Live Search Continues Referral Spam Tests With MSLIVSOP?

I noticed an update to the WebmasterWorld thread with the discussion of the weird referrals in the form of spam-like referrals coming from Live Search as cloaking tests. It appears a webmaster is now noticing a bot named MSLIVSOP serving...

Ask.com Fixes Crawler Issue With Badly-Formed URLs

Earlier this week, we reported that Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors. In short, Ask.com's crawlers were crawling badly formed URLs, causing tons of 404 errors in web server log files. Vivek Pathak, Ask.com's Infrastructure Product...

Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors?

A WebmasterWorld thread is reporting several webmasters noticing that Ask.com's crawler has recently been generating tons of 404 (file not found) errors on their sites. The issue appears to stem from Ask.com auto inserting URL-Encoded spaces into the URL. URL-encoded...

MSNBot Reverse DNS Test Fails Requirements

A year ago, Microsoft promised to enable Webmasters a method of verifying MSNbot. Way too often, rogue spiders mask themselves as official spiders from Google, Yahoo, Live Search or Ask.com. The search engines have enabled methods to conduct reverse DNS...

Double Check Your Robots.txt: Google Testing New Crawler Directives

Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...

Is a Robots.txt File Required for Search Engine Optimization?

A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...

Managing the Robots.txt File for Sites Sharing Same Local Files

A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...

Yahoo! Supports Robots-Nocontent: Enabling Organic Search Page Section Targeting

Yesterday afternoon, Yahoo! announced support for a new attribute that Webmasters and SEOs can use on their pages to help aid the search spiders determine what content is the most important content on the page, by excluding extraneous or irrelevant...

Google & Search Engines Do Not Mind Bad HTML When Crawling

An Adam Lasnik post in Google Groups sprung a post at Cre8asite Forums explaining that if you have bad HTML, Google will be OK with it. Yes, that is the case, your code does not need to be 100% validated...

Bot Attacks: Yes It Can Happen To You

We all know about PPC fraud and that some of the fraud is caused by bots (robots) that click on the ads and drive up your bill and unwanted traffic. But it gets more serious than that. Bot are also...

Yahoo! Slurping Wildcards Via Robots.txt File

Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...

Scrape Bots Vs. Search Bots :: Fighting the Battle

A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...

MSN Crawlers Are Named

How cute, seriously, MSN has finally given names to their baby crawlers. You know, Google names their crawlers, i.e. GoogleBot, MediaBot, etc... Yahoo has Slurp, etc. Now MSN has named their crawlers. The MSN Shopping bot is msnbot-products. The MSN...

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here