Google Trends Attacked, Again: Targets Former World Trade Center Towers | Main | Ranking in Google Search For Plurarls Vs. Singulars

Why Does The Site Command Show More Indexed Pages Then Google's Sitemap Report?

A WebmasterWorld thread asks why does the site command in Google not match up in the number of "indexed" URLs reported in Google Webmaster Tools. A very valid question, let me show you.

A simple site command in Google for site:www.seroundtable.com returns 17,500 results. So that means, Google has indexed approximately 17,500 pages from the www of this domain.

Search Engine Index Counts

Now, if I login and check my Sitemap data for this site (yea, I finally created a Sitemap file), it shows about half of the indexed URLs. It says Google has indexed 8,813 URLs of the 9,086 I submitted.

Search Engine Index Counts

For me, the answer is simple. I seem to only sending URLs of the individual blog posts here. So although I have about 9,000+ blog posts at this domain, I still have about twice as many pages on this site, due to the categories, date archives, tag landing pages and so on. Those pages are not included in my Sitemap file. So Google seems to only showing the indexed URLs of what I submitted. Of course, it is hard for me to validate that by just looking at the numbers.

What I found interesting is when I went to Yahoo's Site Explorer, Yahoo told me they h have indexed 16,498 of my pages, but crawled only 15,022 pages and thus know about 16,498 of my pages. I guess via linkage data, they can index more of my pages then they actually crawl?

Search Engine Index Counts

In fact, Yahoo's numbers for a inurl:seroundtable.com command is almost on target to the numbers they report in Site Explorer, which is nice.

In regards to what is going on with Google... I am not sure if the results are accurate or not. Tedster at WebmasterWorld said:

I'm never surpised when Webmaster Tools information seems peculiar in some way - it happens a lot. Also note that site:example.com results are getting weirder and weirder, often omitting urls that definitely are in the index - sometimes with a simple site:example.com/directory/ query.

Forum discussion at WebmasterWorld.



Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!

posted rustybrick in Google Optimization at January 7, 2009 8:25 AM Comments (4)

Comments

I'm not sure, but it could be the first result is returning total pages it has indexed, and the second/lesser result is displaying the indexed pages, but not including those that could be in their supplemental index.

 

I was wondering the same thing a couple of months back and did some testing. Based on my results the indexed URL's they are reporting in Webmaster tools are only the URL's which are indexed and listed in the sitemap.xml file. If you don't include a URL in the file even though it is available and indexed, Google won't report it as indexed. I like this feature because it helps me understand how many of the pages I feel are important are indexed and gets rid of the noise in the results of the site command search on Google

 

Hi,
While you look @ the sitemap (XML) and the result displayed for site query in google, and compare the links then we can see that the pages for each tag are also being indexed but those pages are not included in the sitemap file.
See here for some example links,
www.seroundtable.com/tag/digitalpoint
www.seroundtable.com/tag/doubleclick
www.seroundtable.com/tag/acquisitions

 

No Name, that is what I said in the post.

 

Post a comment (Note: Can Take 120 Seconds For Your Comment To Show Up)

Do you want us to save your personal Information?


To subscribe to the Search Engine Roundtable, click here