Is Google Crawling Absolute Paths?

Jun 30, 2008 • 10:26 am | comments (3) by twitter | Filed Under Google Search Engine Optimization
 

We've repeatedly recommended Google Webmaster Tools to help you assess problems with your website. It can help you figure out if you're having some problems somewhere and then you can get to the bottom of them. In a specific example, a Cre8asite Forums member is finding that Google is crawling absolute paths, such as http://domain.com/var/www/html/page.php and /home/public_html/admin.html, which shouldn't ever happen. Is it a problem with Google, or is there a structural issue with the site that the webmaster is not aware of?

It's likely that the problem is not Google, in this case. Somewhere, somehow, there's probably a link there that Google found because it was referred to on your site. Use a tool like Xenu Link Sleuth to figure out where it's located, as many forum members recommend.

In this case, it was a broken link -- and no, Google does not crawl the pages as JohnMu mentions:

At any rate, we can't view your servers file-system (at least not if it's configured correctly), so even if we happen to stumble upon a URL like that, your error page should keep us from worrying too much about it (make sure it returns 404).

Forum discussion continues at Cre8asite Forums.

Previous story: Is there a Minimum Number of Pages Required for Decent Organic Rankings?
 

Comments:

gabs

06/30/2008 03:22 pm

Here a good example: http://www.google.co.uk/search?hl=en&safe=off&q=site:mrsite.co.uk&start=10&sa=N This was the massive uk cms that got hit last week...

Michael Martinez

06/30/2008 03:58 pm

These are server configuration issues. NO search engine can crawl a path outside of the Web server's internal name space. If a search engine finds those paths, it is being shown those paths as if they are within the hierarchy of the Web service's public HTML tree. These are more likely VIRTUAL paths than absolute paths.

Rob Abdul

07/01/2008 09:43 am

I agree with Michael.

blog comments powered by Disqus