Google's New Advanced Index Status Report

Jul 25, 2012 • 8:18 am | comments (14) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Google Webmaster ToolsGoogle added a new feature last night to Google Webmaster Tools that really shows you some nice detail on your index status.

The report not only shows the number of pages crawled by Google, but breaks them down on the total number of pages crawled that were indexed, the pages that were crawled that were not indexed and the pages that were attempted to be crawled but were blocked.

Here is what it means:

  • Total indexed: The total number of URLs from your site that have been added to Google's index.
  • Ever crawled: The cumulative total of URLs from your site that Google has ever accessed.
  • Not selected: URLs from your site that redirect to other pages or URLs whose contents are substantially similar to other pages.
  • Blocked by robots: URLs Google could not access because they are blocked in your robots.txt file.

Let me show you what various sites look like in this chart view.

A normal looking site's advanced crawl status:

Google's Advanced Index Status Report - Normal

A site that redirected their URLs to a new site (shared by JohnMu):

Google's Advanced Index Status Report - Redirect

A new site just starting to get indexed with lots and lots of pages:

Google's Advanced Index Status Report - New Large Site

One key point as Google said:

Notice that the counts are always totals. So, for example, if on June 17th the count for indexed pages is 92, that means that there are a total of 92 pages indexed at this point in time, not that 92 pages were added to the index on that day only. In particular for sites with a long history, the count of pages crawled may be very big in comparison with the number of pages indexed.

This is an outstanding tool and I think many SEOs and webmasters will benefit from it.

Forum discussion at Google Webmaster Help and Google+.

Previous story: Your Bing Ranking Data Might Stop Working On August 1st
 

Comments:

William Vicary

07/25/2012 12:28 pm

Its a nice start but I wouldn't say its "outstanding" quite yet, when they allow us to export the pages to CSV then I'll agree with you. At the moment IMO it's a fancy graph which will allow you to see trends but not solve problems.

i_praveensharma

07/25/2012 12:28 pm

That's great inclusion. But I can see only number of crawled, blocked, etc pages not actual URLs. So, it only shows numbers not actual URLs that have been crawled?

Andy Sheridan

07/25/2012 01:35 pm

I don't see the benefit - apart from being able to confirm that yes indeed, you have been mercilessly kicked out of the index after tonight's Panda update.

Ralph Slate

07/25/2012 01:41 pm

I have a problem with mine - and I don't know how to interpret it. My "ever crawled" number is 17,314,223 and it shows that number consistently from 7/2011 to today. That number causes the scale of the graph to be so high that the other numbers are dwarfed (they appear as straight lines at the bottom). I have no idea what that high number even means. If I deselect the "ever crawled", I still have a problem because in 7/2011, my "Not selected" number is 14,867,057. It drops to 413,576 in 8/2011, but again, that one number screws up the scale of the graph. When I deselect "not selected", the graph becomes meaningful, with total indexed being in the 450k range and blocked by robots being in the 125k range. Should I be worried about those first two insanely high numbers? Could they be causing me trouble in Google's algorithm?

Andy Sheridan

07/25/2012 01:53 pm

The cumulative figures are always going to be significantly higher than currently indexed or blocked. If those latter numbers seem excessive though, I suggest you thoroughly examine your index profile and check for crawl errors using WMT.

Cheryl

07/26/2012 02:47 am

My thoughts exactly.

Soma Sundaram

07/26/2012 04:16 am

There is no use to know about just the count only. We needed to see the actual urls that are not crawled so that it will helpful to a webmaster. Anyway we always welcome this kinda future

smiyff

07/26/2012 11:05 am

Although it's good to see more functionality in Webmaster tools this isn't really all that useful. What would be much more insightful is the ability to extract some of the URLs bucketed. So if we could extract even a sample of what Google classifies as not selected we'd get a better understanding of problem pages.

Jon H

07/26/2012 02:46 pm

so what is the not selected including? It appears to have 301, 302 and duplicate pages highlighted through the canonical element or Googles guess work and it seems to ignore 404. do peope agree with that? the total index also seems to include pages with filters and penalties on as well. I agree about the tool be good but its not great, give us a csv file please Google

Murthy Seo

07/27/2012 04:14 am

Good to see the additional feature! is possible to view those all urs by category wise? like want to see indexed url, ever crawled etc..

Bob Shirilla

07/27/2012 03:12 pm

How can I determine which pages on my site are "Not Selected"?

Myindustry

07/28/2012 06:20 pm

This is my question too.

Yw

07/31/2012 04:14 am

I wonder if having a higher number of URLs "not selected" compared to the number of URLs "indexed" might be an indication on the quality of the site content? What do you folks think?

problemmii

10/28/2012 02:42 am

in my website www.problemmi.com , the unselected links are 3 times the indexed links .. thanx again

blog comments powered by Disqus