Tons Of "Not Selected" In Google Index Status?

Nov 28, 2012 • 8:53 am | comments (16) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Google Webmaster ToolsAbout five months ago, Google introduced additional index status reports that showed a lot more information about how Google indexed your site.

Recently, a lot of webmasters have been asking about the "not selected" number. Technically, Google explains it as "URLs from your site that redirect to other pages or URLs whose contents are substantially similar to other pages."

John Mueller of Google explained it a bit more in depth in this Google Webmaster Help thread saying:

The number of "not selected" URLs is based on URLs that are either substantially similar or redirecting -- if you have changed your site's URL structure and have redirected those URLs, then that would be a good explanation for that. That curve would also be fine and not a signal of a problem.

Now, what happens when your not selected number is well above your other numbers, such as index count? Should you be worried? I might be. It may show signs of some structural issues with the site.

Here is one chart of someone who has such an issue:

Google Index Status Not Selected

Now, to me, this is a bit scary. Can a site like this do well in Google? It is possible but look at all that lost potential.

Forum discussion at Google Webmaster Help & WebmasterWorld.

Previous story: Monitor Google AdWords Ad Disapprovals With New Reports
 

Comments:

gabs

11/28/2012 02:07 pm

I have this on quite a few sites and it makes sense as these sites have millions of pages that are user generated so pages like "post by user x" will of course be similar to the pages with those post on.. Pagination will also effect this....

hGn

11/28/2012 02:26 pm

The "Subscribe To This Discussion" option isn't working?

Josh

11/28/2012 04:44 pm

Bingo. Agreed 100%.

FP Marcil

11/28/2012 05:07 pm

Not sure how the "not selected" trend affects results. I have not seen in any correlation in my data that would indicate that Google uses this in any way. However, the "selected" curve is something I would look at, particularly if it doesn't trend with the "not selected" curve. Panda affected sites often have a low "selected" count, but I wouldn't say that I have a satisfying statistical correlation to make a bolder statement about it just yet. (This statement would be: “a low selected pages count is a sign that you may be affected by Panda or at risk of being affected in the relatively near future”.)

josh bachynski

11/28/2012 05:13 pm

it's not a ranking problem - it just meanas those were not good ranking candidates - it COULD be a crawling problem as once you get around 100,000 pages googlebot slows down to not bog down your server - then it turns into a ranking problem as you are not fresh as other sites

Roger

11/28/2012 07:19 pm

It may be also high when you have lot's of pages with metatag no-index, for example internal search results? Google can scan those pages and follow the pagination of the search results. But all of them are no-indexed.

Jim Christian

11/28/2012 09:36 pm

I can tell you our issue. We use query string variables on our URL's and we noticed sometimes our developers don't add a rel canonical code to the page. The lack of rel canonical causes the number of query string variables to skyrocket thus super inflating the number of similar pages. We just started looking into the problem last week and are now adding the canonical code. what would be REALLY helpful is to be able to download a list of these similar pages. That way we could identify where the problem is rather than auditing the entire site. To add: Could be an international site issue with href lang....

Richard Gailey

11/29/2012 02:07 am

Here's mine: http://imgur.com/wxgSk The 'Not selected' are higher as you have reported. My traffic has been fairly steady though, bar the dip/rise depending on which algorithm update they come up with. I noticed this quite a while ago now and was pretty concerned. The explanation that Google gave didn't really make sense to me, but seeing that traffic continued to remain consistent I didn't really give it much thought. Will be interested to hear peoples thoughts on this though.

chris

11/29/2012 09:48 am

Does anyone know the difference (if any) between supplemental index results and "not selected" pages please?

Jodi

11/29/2012 04:29 pm

Could it also be due to having an aggregated news feed on one's site? We only have 83 pages indexed, but our "not selected" pages total 1,889. We only have a few pages which have query strings (48 to be exact), but we have an aggregated news feed which pulls in stories from several other sites.

Greg Fowler

11/29/2012 04:56 pm

I am having similar issues with all of the sites. I just beginning to wonder what exactly are you suppose to do with the data since Google doesn't give you the URLs?

John Doherty

11/29/2012 11:53 pm

Hey Gabs - What do you mean by "pagination will also affect this"? Can you give some examples?

Aaron

12/04/2012 05:34 am

Having this issue enow, wish they provided URL's... I assume this could be an issue with WordPress categories and tags

gabs

12/04/2012 11:37 am

Take a site that has Category pages that you add new stuff to all the time and it push what would be on page 1 to page 2.. Google would see them as the same until it recrawled page 1 to see it was different..

Ramesh Nair

12/14/2012 05:59 pm

I see the spike on our corporate site too. Though I haven't seen any drop in traffic or ranking due to this, I think changing the structure a bit, proper canonicalization, seo-friendly pagination and avoiding duplicate title tags and meta descriptions may help.

Babita Tomar

02/04/2013 11:32 am

u r right Gabs. It may be the cause of this prob.

blog comments powered by Disqus