Why Does The Site Command Show More Indexed Pages Then Google's Sitemap Report?

Jan 7, 2009 • 8:25 am | comments (6) by twitter Google+ | Filed Under Google Search Engine Optimization

A WebmasterWorld thread asks why does the site command in Google not match up in the number of "indexed" URLs reported in Google Webmaster Tools. A very valid question, let me show you.

A simple site command in Google for site:www.seroundtable.com returns 17,500 results. So that means, Google has indexed approximately 17,500 pages from the www of this domain.

Search Engine Index Counts

Now, if I login and check my Sitemap data for this site (yea, I finally created a Sitemap file), it shows about half of the indexed URLs. It says Google has indexed 8,813 URLs of the 9,086 I submitted.

Search Engine Index Counts

For me, the answer is simple. I seem to only sending URLs of the individual blog posts here. So although I have about 9,000+ blog posts at this domain, I still have about twice as many pages on this site, due to the categories, date archives, tag landing pages and so on. Those pages are not included in my Sitemap file. So Google seems to only showing the indexed URLs of what I submitted. Of course, it is hard for me to validate that by just looking at the numbers.

What I found interesting is when I went to Yahoo's Site Explorer, Yahoo told me they h have indexed 16,498 of my pages, but crawled only 15,022 pages and thus know about 16,498 of my pages. I guess via linkage data, they can index more of my pages then they actually crawl?

Search Engine Index Counts

In fact, Yahoo's numbers for a inurl:seroundtable.com command is almost on target to the numbers they report in Site Explorer, which is nice.

In regards to what is going on with Google... I am not sure if the results are accurate or not. Tedster at WebmasterWorld said:

I'm never surpised when Webmaster Tools information seems peculiar in some way - it happens a lot. Also note that site:example.com results are getting weirder and weirder, often omitting urls that definitely are in the index - sometimes with a simple site:example.com/directory/ query.

Forum discussion at WebmasterWorld.

01/08/2009 03:22 pm

I'm not sure, but it could be the first result is returning total pages it has indexed, and the second/lesser result is displaying the indexed pages, but not including those that could be in their supplemental index.

No Name

01/08/2009 03:31 pm

I was wondering the same thing a couple of months back and did some testing. Based on my results the indexed URL's they are reporting in Webmaster tools are only the URL's which are indexed and listed in the sitemap.xml file. If you don't include a URL in the file even though it is available and indexed, Google won't report it as indexed. I like this feature because it helps me understand how many of the pages I feel are important are indexed and gets rid of the noise in the results of the site command search on Google

No Name

01/09/2009 06:19 am

Hi, While you look @ the sitemap (XML) and the result displayed for site query in google, and compare the links then we can see that the pages for each tag are also being indexed but those pages are not included in the sitemap file. See here for some example links, www.seroundtable.com/tag/digitalpoint www.seroundtable.com/tag/doubleclick www.seroundtable.com/tag/acquisitions

Barry Schwartz

01/09/2009 10:59 am

No Name, that is what I said in the post.


06/28/2010 09:10 pm

Good article you've posted here, thanks, just to make a quick note to anyone interested in self SEO, that once you're done submiting your website to the important search engines, you could do a quick check for your website in http://ministatus.com and see the exact number of indexed pages or the number of backlinks (according to most important engines) ... more you could do a daily check and see the changes or just download your seo score in PDF file and hand it to someone who knows what to make of it ;-)


08/09/2013 07:58 am

My site shows more indexed pages but I have only 7 pages in my site. http://dhivyarajashruthi.in

