Verify The Bots Accessing Your Site: Is Google.com Sending That GoogleBot?

Mar 7, 2007 • 7:13 am | comments (1) by twitter Google+ | Filed Under Google Search Engine
 

There is no doubt that a ton of bot activity on one's sites are from rogue spiders. Spider or bots that pretend to be legit bots but are there to steal your content. We have covered several sessions on this in the past; here are some:

A new Cre8asite Forums thread asks a question on how does one verify if GoogleBot is really from Google.

Matt Cutts posted a detailed How to verify Googlebot back at the Webmaster Central Blog on 9/20/2006 explaining how to do reverse DNS and then a forward DNS->IP lookup.

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1 1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.

Of course there are some ways to automate this. Either code it yourself, buy CrawlWall or implement a solution similar to Ekstreme's PHP Search Engine Bot Authentication.

Rogue spiders are no fun, as we have seen in cases with some forums.

Forum discussion at Cre8asite Forums.

Previous story: Do MSN Live.com Search Reinclusion Requests Work?
 

Comments:

Lucian

03/16/2007 12:38 pm

You can use http://www.ipgp.net to get exact information and map about IP's

blog comments powered by Disqus