Entries from Search Engine Roundtable tagged with 'crawlers'

Can Google's First Click Free Program be Used for Web Search?

Last night, I had a nice chat with Googler, JohnMu. I joked around with John, asking if he has messed up yet, in terms of Google communication with webmasters. He said not really - which I agree with. But he...

Google Now Crawling Content Behind Forms

We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event,...

Yahoo Slurp, Yahoo Search Crawler, Suffering From ADHD?

A DigitalPoint Forums thread has dozens of reports that Yahoo Search's crawler, Yahoo Slurp, took some bad medicine recently. Many are reporting that they see the crawler spidering their sites like never before. Some times they have seen the spider...

Google's Remove URLs Feature Removes For 90 Days Only

Just a tidbit based on a Google Groups thread, using the Google Remove URLs feature will only remove the content from Google for 90 days. After 90 days, if you do not block the page from crawlers or tell crawlers...

Yahoo Slurp Taking a Break? Reported Slow Crawling Activity

In August, Yahoo announced a new crawl behavior for Slurp, Yahoo's web crawler. The new crawl behavior was suppose to tame the crawler to go through your site in a more relaxed and efficient manner for both the crawler and...

Possible GoogleBot DNS Issues Causing Indexing Issues at Google.com

A detailed Google Groups thread is reporting various reports of webmasters claiming GoogleBot is timing out before reaching their pages. First, these webmasters are noticing a drop in GoogleBot activity on their server. So they login to Google Webmaster Tools...

Managing the Robots.txt File for Sites Sharing Same Local Files

A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...

Webmasters Not Happy with Yahoo's New Crawl Behavior

Last week we reported on a Yahoo update and a new method of crawling. The new crawl behavior is supposed to help the Yahoo bot, Slurp, be more efficient on your site. It seems that many SEOs and Webmasters are...

Yahoo! Slurp Now Located at crawl.yahoo.net

The Yahoo! Search Blog has announced that webmasters will now see Yahoo's spider, Yahoo! Slurp, returning a new domain name in your logs. The same IP addresses now render to the domain name crawl.yahoo.net and no longer return the domain...

Yahoo! Slurp on the Loose?

A WebmasterWorld and Search Engine Watch Forums threads are both reporting issues with Yahoo! Slurp (Yahoo!'s Crawler) indexing pages they should not be, and in quantities that may be harmful. It appears that only specific bots are not obeying the...

Is Google Sending GoogleBot CSS Hunting: Google Crawling CSS Files?

A Cre8asite Forums thread links to a blog post named GoogleBot Requested a CSS File. This is not the first time I heard threads where people suspect GoogleBot is crawling their CSS files. But this one has the most discussion...

How Does Google Crawl Pages & Index Them?

A WebmasterWorld thread asks "How does Google determine which pages to crawl?" Google didn't always crawl and index pages as they do now. With the Big Daddy update Google adapted their crawl priorities, which was around April 2006. Google now...

Google & Search Engines Do Not Mind Bad HTML When Crawling

An Adam Lasnik post in Google Groups sprung a post at Cre8asite Forums explaining that if you have bad HTML, Google will be OK with it. Yes, that is the case, your code does not need to be 100% validated...

Google Cache Archiving More of Your Page?

The Google cache typically only stored about 100KB of your page. So if you had a heavy page with lots of content, not all of that page would be seen in the Google cache. That seems to have changed at...

Bot Attacks: Yes It Can Happen To You

We all know about PPC fraud and that some of the fraud is caused by bots (robots) that click on the ads and drive up your bill and unwanted traffic. But it gets more serious than that. Bot are also...

Scrape Bots Vs. Search Bots :: Fighting the Battle

A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines? This is a serious issue, serious enough that there was a session...

MSN Crawlers Are Named

How cute, seriously, MSN has finally given names to their baby crawlers. You know, Google names their crawlers, i.e. GoogleBot, MediaBot, etc... Yahoo has Slurp, etc. Now MSN has named their crawlers. The MSN Shopping bot is msnbot-products. The MSN...


To subscribe to the Search Engine Roundtable, click here