Entries from Search Engine Roundtable tagged with 'robots'

Google Now Crawling Content Behind Forms

We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...

Is Google's AdWords Spider Lowercasing Destination URLs?

A WebmasterWorld thread has an advertiser complaining that Google's AdWords spider appears to be lowercasing the destination URLs they have. The thing is, the lowercase URLs for this webmaster don't work with the site and they don't have the time...

MSNBot Again Failing Reverse DNS Test

Last December, we reported that MSNBot was failing a reverse DNS lookup. Well, guess what folks - MSNBot is failing again on some IP addresses. An updated WebmasterWorld thread brought this to my attention and I verified it myself. Here...

Microsoft Live Search Adds HTTP Compression & Conditional Gets Support to Crawler

The Live Search Blog announced several updated to their crawler. The first is a name change to reflect the upgrade, previously named msnbot/1.0, it is now named msnbot/1.1. The bulk of the changes include the HTTP Compression and Conditional Get...

Can You Hide Text & Links From Your Users, If Search Engines Won't See?

Here is an unusual scenario for you. You have a web page, the web page is one page of many on your site. This specific page does not allow search engines to crawl them by using a robots.txt file to...

Google's Remove URLs Feature Removes For 90 Days Only

Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...

Wordpress Installation Now Blocking Search Engines?

A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...

Microsoft Live Search Continues Referral Spam Tests With MSLIVSOP?

I noticed an update to the WebmasterWorld thread with the discussion of the weird referrals in the form of spam-like referrals coming from Live Search as cloaking tests. It appears a webmaster is now noticing a bot named MSLIVSOP serving...

Ask.com Fixes Crawler Issue With Badly-Formed URLs

Earlier this week, we reported that Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors. In short, Ask.com's crawlers were crawling badly formed URLs, causing tons of 404 errors in web server log files. Vivek Pathak, Ask.com's Infrastructure Product...

Ask.com Crawler Inserting Url-Encoded Spaces in URLs Causing 404 Errors?

A WebmasterWorld thread is reporting several webmasters noticing that Ask.com's crawler has recently been generating tons of 404 (file not found) errors on their sites. The issue appears to stem from Ask.com auto inserting URL-Encoded spaces into the URL. URL-encoded...

MSNBot Reverse DNS Test Fails Requirements

A year ago, Microsoft promised to enable Webmasters a method of verifying MSNbot. Way too often, rogue spiders mask themselves as official spiders from Google, Yahoo, Live Search or Ask.com. The search engines have enabled methods to conduct reverse DNS...

Double Check Your Robots.txt: Google Testing New Crawler Directives

Validate your robots.txt - Googlebot becomes smarter from Sebastian reports official confirmation from Google that they are testing out new crawler directives. He explains that adding "Noindex: /" to your robots.txt file will now deindex your complete site. Specifically, Google...

Is a Robots.txt File Required for Search Engine Optimization?

A Search Engine Watch Forums thread has a simple question. Is a robots.txt file required for SEO? The answer is no, a robots.txt file is not required. If you want the search engines to crawl your site, you do not...

Managing the Robots.txt File for Sites Sharing Same Local Files

A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...

Yahoo! Supports Robots-Nocontent: Enabling Organic Search Page Section Targeting

Yesterday afternoon, Yahoo! announced support for a new attribute that Webmasters and SEOs can use on their pages to help aid the search spiders determine what content is the most important content on the page, by excluding extraneous or irrelevant...

Google & Search Engines Do Not Mind Bad HTML When Crawling

An Adam Lasnik post in Google Groups sprung a post at Cre8asite Forums explaining that if you have bad HTML, Google will be OK with it. Yes, that is the case, your code does not need to be 100% validated...

Bot Attacks: Yes It Can Happen To You

We all know about PPC fraud and that some of the fraud is caused by bots (robots) that click on the ads and drive up your bill and unwanted traffic. But it gets more serious than that. Bot are also...

Yahoo! Slurping Wildcards Via Robots.txt File

Thursday night, last week, the Yahoo! Search Blog wrote Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt. I am honestly a bit shocked by the SEO community's response to this, or lack thereof. I have spotted two threads...

Scrape Bots Vs. Search Bots :: Fighting the Battle

A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...

MSN Crawlers Are Named

How cute, seriously, MSN has finally given names to their baby crawlers. You know, Google names their crawlers, i.e. GoogleBot, MediaBot, etc... Yahoo has Slurp, etc. Now MSN has named their crawlers. The MSN Shopping bot is msnbot-products. The MSN...


To subscribe to the Search Engine Roundtable, click here