How Does Googlebot Find/Index Hidden FTP Logs?

Jul 14, 2006 - 3:16 pm 0 by
Filed Under Miscellaneous

Google's crawler constantly scours the Internet for pages to index, which is one of the reasons you should run away if someone offers to "submit your website to Google." On any page you do not want indexed, it is important to disallow the Googlebot (one of the nicknames for their spiders) by using special code. A prime example of pages you may not want indexed would be new pages under construction, especially if they contain content you already have in the index on "live pages."

A recent thread at WebMasterWorld Forums shows us another example of pages you probably don't want in the Index: your FTP logs. The member complains:

My FTP log is cached by Google...and there has never been a link to it, ever!
The first response is fairly obvious, indicating that all FTP log and other pages that you do not want indexed should be password protected, therefore making it impossible for the Googlebot to crawl. So knocking out links and assuming the pages are protected, could it still be possible for the Googlebot to find the URL and "accidentally" index it?

One member astutely reminds readers that

When you use the Google toolbar and have the PageRank bar enabled, it sends url data to Google so.. so Google knows what urls exist out there, even if they are not linked to anywhere. So you have to be careful about what links you pull up when the PageRank bar is enabled.

The original poster comes back and thanks everyone for their responses, but claims it’s not that simple...The discussion continues at WebmasterWorld Forums.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Updates

Google March 2024 Core Update Finished April 19th (A Week Ago)

Apr 26, 2024 - 4:40 pm
Search Forum Recap

Daily Search Forum Recap: April 26, 2024

Apr 26, 2024 - 4:00 pm
Search Video Recaps

Search News Buzz Video Recap: Google Core Update Updates, Site Reputation Abuse Coming, Links, Ads & More

Apr 26, 2024 - 8:01 am
Google Search Engine Optimization

Google Publisher Center No Longer Allows Adding Publications

Apr 26, 2024 - 7:51 am
Google

Google Tests Placing The Snippet Date Next To URL

Apr 26, 2024 - 7:41 am
Google

Google Breaks Out Googlebot IP Ranges For User-Triggered Fetchers

Apr 26, 2024 - 7:31 am
Previous Story: Is Click Fraud Actually Good For Google?