Entries from Search Engine Roundtable tagged with 'crawling'

Google Sitemaps Last Download Date, Should We Care?

A Google Webmasters Help thread has a webmaster worried that Google has not downloaded his XML Sitemap file in about five days. I went to check the status of my sitemap file in Google Webmaster Tools and Google has not...

Multiple Robots.txt Files for Single Domain

A HighRankings Forum thread asks why do some people use more than a single robots.txt file to control and instruct search spiders how to crawl and access their content. That is a good question. Typically, the spiders will only listen...

Survey Says: Google Sitemaps Gets Credit For Faster Indexing

About a week ago, we ran a poll asking Who's To Credit For Faster Indexing? The options included Google Sitemaps or FeedBurner, due to the topic we were discussing. The results are now in and the majority said, Google Sitemaps,...

Who's To Blame For Faster Indexing? Google Sitemaps or FeedBurner

An SEOMoz post charts the positive impact having a Sitemap file can have on the speed of Google and Yahoo crawling and indexing your web pages. The report seems pretty impressive and I myself feel that Sitemaps are important to...

Why Does The Site Command Show More Indexed Pages Then Google's Sitemap Report?

A WebmasterWorld thread asks why does the site command in Google not match up in the number of "indexed" URLs reported in Google Webmaster Tools. A very valid question, let me show you. A simple site command in Google for...

Live Search Begins Crawling JavaScript with MSNBot-Media

incrediBILL, moderator at WebmasterWorld, noticed that one of Live Search's bots was crawling through his JavaScript. The bot is named MSNBOT-MEDIA and he noticed that it was accessing JavaScript files and AJAX functions. He noticed that the bot was triggering...

Why Shouldn't SEOs Obsess Over the Site Command

Many SEOs use the site command to see how healthy their site is in a particular search engine. So you plug in site:www.mydomain.com in a search engine and the search engine will return the number of pages they have indexed...

Ditched GoogleBot But Now Want To Make Friends Again?

I found an interesting tidbit while reading a somewhat detailed thread at Google Groups. The scenario is as follows. You have blocked Googlebot from accessing your site for a 6 month period or so. Then you want to welcome Googlebot...

Is Google Crawling & Indexing All of My Pages?

Michael Gray has composed a post that helps SEOs find out which pages of their site haven't been crawled, which becomes increasingly more important due to Google's removal of the supplemental index. He explains that you should put a timestamp...

Did Google Stop Crawling Blogger Blogs on Personalized Domains?

There are threads at Google Groups and DigitalPoint Forums with multiple reports of Google not crawling Blogger hosted blogs, that are on custom or private domains (i.e. not on blogspot.com domains). Many have reported that the Googlebot crawling has stopped...

Can You Remove a Site From a Country Specific Google Search Engine?

An unusual question came up at WebmasterWorld, asking if you can request a site to be completely removed from a country specific Google search engine. For example, the site owner wants to remove his site from Google Netherlands, because the...

Google Crawl Rate Drops: Google Responds?

Yesterday I reported that GoogleBot is crawling less pages then they once were, based on a large WebmasterWorld thread. Now, I spotted a response from a Googler at a Google Groups thread with similar complaints. This time, I decided to...

GoogleBot Getting Tired? Google's Spider Crawling Less Documents

A WebmasterWorld thread reports from dozens of Webmasters that GoogleBot, Google's web crawler has not been crawling as many documents as they have in the past. Many webmasters are noticing reduction in crawl rates as much as 90-percent, relative to...

Managing Duplicate Content In a World Where Google Can Crawl JavaScript

Now that Google admitted to crawling JavaScript and forms SEOs and Webmasters need to be aware of how to manage even more duplicate content issues. In the past, a good strategy was to build out filter pages (filter by color,...

Wordpress Installation Now Blocking Search Engines?

A WebmasterWorld thread reports that new installations of the popular blogging software, WordPress, is by default blocking all search engines. He said, when you go to the Privacy Options section in the administration panel, by default, it is set to...

Google's Set Crawl Rate Feature Works at Domain or Sub Domain Only

A Google Groups thread has a fairly simple but educational FAQ on how the "Set Crawl Rate" feature works in Google Webmaster Tools. In short, you can only set the crawl rate for a site on the domain or subdomain...

Yahoo Slurp Taking a Break? Reported Slow Crawling Activity

In August, Yahoo announced a new crawl behavior for Slurp, Yahoo's web crawler. The new crawl behavior was suppose to tame the crawler to go through your site in a more relaxed and efficient manner for both the crawler and...

How To Ask GoogleBot (Google) To Crawl Your Site

Last week Tamar wrote about How to Stop Googlebot from Crawling Your Site Rapidly, so I thought I write about the opposite. How can you induce GoogleBot into crawling your site. Although there is no magic shot that guarantees inducement...

GoogleBot Not Sending IF_MODIFIED_SINCE Request?

A WebmasterWorld thread discusses a more detailed issue with how Google's spider, GoogleBot, is crawling some pages. Let me quote the detailed explanation: I've tried: Checking for the HTTP_IF_MODIFIED_SINCE header and returns "304 Not Modified" if possible. Problem: Googlebot doesn't...

Possible GoogleBot DNS Issues Causing Indexing Issues at Google.com

A detailed Google Groups thread is reporting various reports of webmasters claiming GoogleBot is timing out before reaching their pages. First, these webmasters are noticing a drop in GoogleBot activity on their server. So they login to Google Webmaster Tools...

Managing the Robots.txt File for Sites Sharing Same Local Files

A Cre8asite Forums thread asks how can he generate unique robots.txt files for each domain he has, when each of those sites are sharing the same local files through a form of IIS mirroring? There are several ways to do...

Made for AdSense Sites Can Get You Delisted

A very interesting Google Groups thread has many bloggers, including SEO Buzz Box, DaveN, and Search Engine Journal voicing their reactions. The background is that the webmaster of AlkenMRS.com realized that his 10+ year old site had been delisted from...

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here