Google Site Command Inflated?

Aug 23, 2005 • 9:21 am | comments (2) by twitter Google+ | Filed Under Google Search Engine Optimization
 

One of my favorite commands in Google is the site:www.domain.com command. If I wanted to see all pages indexed by Google (or most other engines) you simply type in site:www.domain.com. So for example, if I wanted to see all the pages MSN Search indexed of MSN Search Results (laugh out loud), you go to search.msn.com and plug in site:search.msn.com to get 50,077,341. Now, this work well on Google, as well.

I prefer to use the syntax at Google, allinurl:www.google.com site:www.google.com, it tends to order the pages in order of popularity this way (no proof, of course). You will also notice that Google doesn't index its own SERPs, like MSN does. A forum thread at WebmasterWorld asks, Why are "Site:" command pages inflated? Members lammert, g1smd, and bull all provide solid answers, which I will quote below.

  • URLs temporarily deleted with the URL removal tool
  • URLs from other sites doing a 302 hijack of your site (should be fixed by now)
  • Obsolete URLs which have still links to them from other sites and which Google visits now and then just to see of they are active
  • Links to your site with typos in it i.e. www.yourdomain.com/fiel.html instead of www.yourdomain.com/file.html. At one time I had many copies of my sitemap in the SERPs because I used the sitemap as my 404 page. Except for the original sitemap they now all went supplemental, but Google still counts them.
  • URLs that have been marked with "noindex,follow".
  • Serving both www and non-www but without a redirect.
  • Items crawled by the Mozilla Googlebot only.

Add also that Google also shows the supplemental index in that count, not in the API results but in the normal Web search results. Also, you might think you have X pages on a dynamic site, but you can have a infinite number of pages generated through a dynamically driven Web site.

Previous story: Ask Jeeves Gets Smarter with More Smart Answers
 

Comments:

Vic

08/23/2005 03:52 pm

It's interesting you make this comment because today I have been trying to analyze the actual count that Google says my entire domain has - 1.56MM pages. But if i take the actual directories that exist and do a site:xxx.xxxx.xxx inurl:xxxx i am only able to account for 500k pages. Where are the other million pages that they are claiming to have in their index?

Drew

02/05/2009 07:24 pm

Any ideas why when omitting the www it shows a ton more?

blog comments powered by Disqus