The Invisible Spider: Covert Crawler

Jan 17, 2006 - 8:52 am 0 by

A thread over at Cre8asite forums named New kind of spider is in town links to a Wired article named Covert Crawler Descends on Web. In short, this article describes a new kind of spider designed to crawl the Web as human-like as possible.

How Does it work?

The program comes from different internet addresses, simulates different browsers and throttles itself to human-like speeds... Hoffman's program downloads everything that comes with a page -- images, JavaScript and components like ActiveX and Flash -- instead of just hitting the page itself like traditional spiders do. It also simulates a full web browser, keeping a cache and requesting only new material... To select which links to click on, Hoffman has settled on a solution somewhere between a masterful AI and completely random selection. "In some ways it's a very simplified Turing test -- you can assign the different threads a personality. This crawler, you're the slow reader, you read the entire page." Another thread may spend less time on a page before it starts clicking on different links. "Each individual crawler has its own browser habits," he added.

Barry Welford calls this spider, "somewhat scary" and that I agree with. Ron Carnell has it right, "any robot that doesn't ask for and then follow robots.txt is, by definition, unethical." So Ron gives you a technique you can use to track and then block this type of bot.

Forum discussion at Cre8asite Forums.

 

Popular Categories

The Pulse of the search community

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: January 16, 2025

Jan 16, 2025 - 10:00 am
Google News

Google Partners With The AP To Gain Real Time Data

Jan 16, 2025 - 7:51 am
Google Ads

Google Ads Tests Chat Button In Top Nav Of Advertiser Console

Jan 16, 2025 - 7:41 am
Google

Google Tests Search Results Zooming In To Next Page Of Search Results

Jan 16, 2025 - 7:31 am
Google Search Engine Optimization

Google Prefers Review Ratings To Contain Author Name & Comments

Jan 16, 2025 - 7:21 am
Google Ads

Google Updates Child & Teen Ads Policies Around Transparency

Jan 16, 2025 - 7:11 am
Previous Story: Yahoo Submit Your Site Still Timing Out