Entries from Search Engine Roundtable tagged with 'spiders'

Bing's MSNBot Crawl Happy?

Most search spiders have been known to get a bit crawl happy from time to time. But the most complaints over time come from MSNBot which tends to often get out of hand and send their spiders on individuals sites...

Bing Masking MSNBot Under Mozilla's UserAgent & Reverse IP Fails

With all the on going issues with MSNBot not behaving, I am not too surprised to see more complaints about the little spider. New confirmed reports from Bing Forums shows that MSNBot is hiding itself under the UserAgent of Mozilla/4.0....

Stop Spiders From Crawling Your Site on Shabbat, Including GoogleBot

A Google Webmaster Help thread has an interesting discussion around blocking your site from coming up for both visitors and search engine crawlers on Shabbat (the Jewish Saturday). This is not a new topic, we discussed using cloaking for religious...

DigitalPoint Founder Upset With MSNBot's Crawl Rate (MSNBot 2.0b)

Shawn Hogan, DigitalPoint's founder, has posted a thread at DigitalPoint Forums clearly showing his frustration with MSNBot, Microsoft Bing's search crawler. He is upset that the bot is crawling too much, too fast - causing an unnecessary spike in load...

MSNBot Clicking on Bing adCenter Search Ads?

There is a thread I have been watching at the Bing Community where one member said that he had log files that shows MSNBot (Microsoft Bing's crawler) is clicking on Microsoft adCenter search ads, possibly charging him for those clicks....

New MSNBot Named adidxbot Causing Trouble

There are several reports around the web about a new search bot by Microsoft that is causing major issues for web servers. The bot is named adidxbot and the useragent looks like this: adidxbot/1.1 (+http://search.msn.com/msnbot.htm). This bot has been on...

Tracking Search Bot Activity

Back in the day, tracking how bots accessed your site was a bit of a crave. Now, you don't hear about it much. The old Google Analytics, aka Urchin, had a section for displaying bot activity on your site. It...

Multiple Robots.txt Files for Single Domain

A HighRankings Forum thread asks why do some people use more than a single robots.txt file to control and instruct search spiders how to crawl and access their content. That is a good question. Typically, the spiders will only listen...

Live Search Begins Crawling JavaScript with MSNBot-Media

incrediBILL, moderator at WebmasterWorld, noticed that one of Live Search's bots was crawling through his JavaScript. The bot is named MSNBOT-MEDIA and he noticed that it was accessing JavaScript files and AJAX functions. He noticed that the bot was triggering...

Did Google Stop Crawling Blogger Blogs on Personalized Domains?

There are threads at Google Groups and DigitalPoint Forums with multiple reports of Google not crawling Blogger hosted blogs, that are on custom or private domains (i.e. not on blogspot.com domains). Many have reported that the Googlebot crawling has stopped...

Did Ask.com Stop or Slow Crawling the Web?

A WebmasterWorld thread reports Ask.com's crawler has seemed to slow down to a halt. Some webmasters are reporting zero crawling activity from Ask.com, while others are reporting extremely limited crawling activity. WebmasterWorld moderator, jdMorgan, noticed the slow down to, he...

Google Improves Flash Indexing Again

On the first day of this month, we reported that Google and Yahoo were to begin indexing Flash files. According to the pertinent Google Webmaster Central blog post, Google is able to crawl the contextual elements in these blog posts....

Google Retraction: Blocking Regions Is Not Cloaking

Yesterday, we reported that Google's John Mueller said that if you block a whole region from accessing your site, it would be considered cloaking and thus be against Google's Webmaster guidelines. Since then, we have seen many comments on that...

Updated: Google Says Blocking Countries Outside of the US is Against Policies

A Google Groups thread has a webmaster who has been receiving a lot of rogue spider attacks from the Africa region. He wants to go as far as ban the whole continent of Africa. But he is concerned that by...

Can You Remove a Site From a Country Specific Google Search Engine?

An unusual question came up at WebmasterWorld, asking if you can request a site to be completely removed from a country specific Google search engine. For example, the site owner wants to remove his site from Google Netherlands, because the...

Can Google's First Click Free Program be Used for Web Search?

Last night, I had a nice chat with Googler, JohnMu. I joked around with John, asking if he has messed up yet, in terms of Google communication with webmasters. He said not really - which I agree with. But he...

Google Now Crawling Content Behind Forms

We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...

Is Google's AdWords Spider Lowercasing Destination URLs?

A WebmasterWorld thread has an advertiser complaining that Google's AdWords spider appears to be lowercasing the destination URLs they have. The thing is, the lowercase URLs for this webmaster don't work with the site and they don't have the time...

Yahoo Slurp, Yahoo Search Crawler, Suffering From ADHD?

A DigitalPoint Forums thread has dozens of reports that Yahoo Search's crawler, Yahoo Slurp, took some bad medicine recently. Many are reporting that they see the crawler spidering their sites like never before. Some times they have seen the spider...

MSNBot Again Failing Reverse DNS Test

Last December, we reported that MSNBot was failing a reverse DNS lookup. Well, guess what folks - MSNBot is failing again on some IP addresses. An updated WebmasterWorld thread brought this to my attention and I verified it myself. Here...

Google's Remove URLs Feature Removes For 90 Days Only

Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...

Wordpress Installation Now Blocking Search Engines?

A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...

MSNBot Reverse DNS Test Fails Requirements

A year ago, Microsoft promised to enable Webmasters a method of verifying MSNbot. Way too often, rogue spiders mask themselves as official spiders from Google, Yahoo, Live Search or Ask.com. The search engines have enabled methods to conduct reverse DNS...

Does Googlebot Index Faster When Integrating Google Custom Search?

Is there any mileage to the claim that it is possible to get your site spidered faster if you integrate the Google Custom Search Engine into your website? This is the question a new webmaster is asking on the High...

Verify The Bots Accessing Your Site: Is Google.com Sending That GoogleBot?

There is no doubt that a ton of bot activity on one's sites are from rogue spiders. Spider or bots that pretend to be legit bots but are there to steal your content. We have covered several sessions on this...

Google Terminates AdWords Account for "Cookie Spidering"?

A WebmasterWorld member reports that his AdWords account has been terminated after being in good standing for four years due to "cookie spidering." He said he spends about $100,000 per year for the past four years, and all of a...

Scrape Bots Vs. Search Bots :: Fighting the Battle

A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here