When Google's Algorithms Don't Index Your Content

Jan 14, 2013 • 9:08 am | comments (23) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Google Index AlgorithmA Google Webmaster Help thread has one webmaster upset that his index count, compared to the number of pages submitted via his sitemap file is continuously going down, as opposed to up.

One of the best ways to see how many pages Google has indexed of your web site is to upload an XML Sitemap file and compare the URLs submitted to the URLs indexed. If that count is close, that is a good thing. If the number of URLs indexed continues to go up, that is a good thing. If that number continuously goes down, there is likely a problem.

So this webmaster wanted to know what the problem was and Gary Illyes from Google explained that Google's algorithms don't want to index many of the pages. He wrote:

As we improve our algorithms, they may decide to not reindex pages that are likely to be not useful for the users. I took a look on the pages that were once indexed but currently aren't and it appears there are quite a few that have no real content.

He showed examples of pages that are soft 404s (it says page not found but returns a 200 status code in the http header). He also showed examples of blank pages being index. Also, he showed examples of URLs in the Sitemap that is referencing URLs that are not canonical.

Healthy sites, need health URLs, content, redirects and proper http header responses. Otherwise, Google may stop indexing and worse, crawling the URLs.

Forum discussion at Google Webmaster Help.

Previous story: Google AdSense Testing No Arrows Now?
 

Comments:

joeyoungblood

01/14/2013 05:46 pm

i have examples in WMT where it shows only 2 / 1,000+ URLs indexed but more than that can be found in the SERPs. WMT is not always accurate, doesn't seem to want to admit these failings as i've tweeted them out to WMT team members and gotten back replies that nothing is wrong with WMT search query counts or sitemap index counts when the evidence points to the contrary.

Gary Illyes

01/14/2013 10:33 pm

Hi Joe, If you post in the forums (link in Barry's post) and then bring the thread in my attention (through Google+ for example), I am more than happy to take a deeper look on the site and its indexed urls counts. From experience, the counts are rather accurate and in the vast majority of the cases the issue is canonicalization of the urls.

RankWatch

01/15/2013 05:46 am

Basically, there is vast difference between analytics done by different analytics tool and i have seen major differences in the count of indexed pages as well. But one thing is for sure: The better the site's content, internal linking of the pages and a fool-proof link building methodology, it will result in good.

Rank Watch

01/15/2013 06:00 am

This is really the way to go as far as Google is concerned. They are doing the right thing by getting rid of the content which is of no use or adds no value to the users, which looks manipulative in nature etc. Not crawling or indexing these pages is the best thing they can do. its nice to see that they are also focusing on de-indexing the already existing pages if they are manipulative in nature.

Brad Dalton

01/15/2013 10:06 am

Not with mine. You have penalized my domain without reason and taken away 90% of my traffic even though my articles are featured every week on some of the most popular industry blogs. My search queries have gone from 17,000 to 700 even though i have added 800 posts which are all tutorials. If you're going to penalize someone, why not tell them why you do it? I have also found one article from SEOMoz which linked back to my domain has been spammed hundreds of times. This is negative SEO and i pay the price even though i have never done any link building in my life!

StevenLockey

01/15/2013 11:45 am

All penalties send a message to the account, you've either not been penalised (its just an algorithmic drop) or you didn't see the message. WMT is normally accurate, just normally a few days behind reality, other than incoming link counts however.

Thomas Schulz

01/15/2013 11:54 am

I have not checked recently, but I ran this experiment *some years* back: Submit two sitemaps with same URLs. Wait till both fully indexed. Then remove one of them and re-add it. The newly re-added will only slowly gain "indexed" count even though the other sitemap file containing the same URLs still say fully indexed: http://www.microsystools.com/products/sitemap-generator/help/webmaster-tools-indexed-urls/

Ken Boostrom

01/15/2013 02:28 pm

In web master tools you can force Google to index your website and individual URLs in 24 hours. Go to web master tools in your google account under products go to web master tools. If you haven't activated it then follow instructions and validate your website. Select the website in Google webmaster tools from your list. Under the Health Tab select "Fetch as Google." Here you can copy and paste URLs and submit. In 24 hours, when you check web master tools - your site will be indexed.

StevenLockey

01/15/2013 04:48 pm

Its not surprising since they must have been on different URLs (even if only WWW to nonWWW) in order for them to both be registered. I'm not sure WMT has ever worked correctly with Sitemaps on external sites if that what you are meaning, thats why you have to be careful to make sure your sitemap has the EXACT same url as the site in WMT (aka don't forget the WWW)

Michael Martinez

01/15/2013 06:34 pm

Wordpress installations can inadvertently create a lot of useless junk URLs -- especially if there are a lot of active plugins and/or theme options because the likelihood of misconfigured URL structures embedded inside navigational links increases. If people are publishing good content on a site but not seeing that content indexed, whereas they see high crawl counts, they should take a closer look at what is being crawled and why.

Mark Asciak

01/15/2013 09:28 pm

WMT is very, very, very slow to update, I'm only now seeing links gone from my profile that were removed 6+ months ago...

Brad Dalton

01/17/2013 01:45 pm

WMT can take months to update the data not a few days. Anyone who uses it knows this.90% drop due to a algorithm change. Must of got it horribly wrong before the change or you're favoring U.S owned domains which i believe is the case. The problem is anyone with a decent undertstanding of SEO can target keyword competitors with negative SEO effectively. Using the disavow tool 6 months after the algorithm change is to late. Why didn't ypou launch the tool before changing the algorithm?

StevenLockey

01/17/2013 02:14 pm

Absolute nonsense. Have you got ANY evidence of negative SEO? All the test from reputable SEOs have shown no evidence of negative SEO, Googlers have said they haven't seen any neg SEO..... The only ones who say it exists can't show any evidence to support it. The only thing that vaguely might count as SEO is when webmasters complain that the spammy links to their site have been discounted by Google, they think they have had neg. SEO when in fact what has happened is they just lost the benefit from the spammy links. WMT is in the vast majority of cases only a few days behind the rest of Google (they can be both behind compared to website updates)

Brad Dalton

01/17/2013 02:29 pm

I have concrete evidence of negative SEO and have written a post about it last year. The truth is Google supports negative SEO against non U.S owned domains and favored them in the search results. This is typical of the way business in the U.S is done and consistant with Google's reputation. There is no way to report negative SEO nor are Google interested in investigating anything against non U.S owned domains.

Brad Dalton

01/17/2013 02:39 pm

Googles reputation is about as good as Lance Armstrongs but you'll never see any Google staff admit guilt. You'll only read denials. They will hide behind a secure office because they need to. They are guilty. The whole algorithm is being constantly tweaked in favor of U.S owned domains and anyone that supports Google. Its never been a fair index and never will be. It will always favor U.S business and continue to be built around U.S business.

StevenLockey

01/17/2013 02:51 pm

Link it then, cos if its the one I remember, everyone pointed out the holes in the logic which were ignored by the author. Like the fact the credit from the spam links had gone causing the drop, not any actual penalty. I'd like to see this 'concrete' proof because I'm 99% sure you are talking nonsense.

Brad Dalton

01/17/2013 02:58 pm

You don't even use a real image of your face. That tells me you are hiding and only here to be destructive. This is also very typical of many people on the Google webmaster forums which in my experience is absolute trash, full of trash comments and typical bullying which American controlled forums are famous for.

Brad Dalton

01/17/2013 03:00 pm

Thats true and i see that in WMT with plugin url's reporting 404's.

StevenLockey

01/17/2013 03:07 pm

Thats cos it an Avatar I use for online games as well. Check my Google+ profile if you want to see my face so despretely. It should still be there, I didn't remove it anyway. Perhaps your experience on the forums was bad simply because you went there demanding you were right and that everything is Google's fault and they needed to fix it to rank your site #1. I still don't see a link to this 'concrete proof' either.......

StevenLockey

01/17/2013 05:04 pm

Do you have ANY evidence to support any of this? Or are you just butt-hurt because your spammy site got caught by Google and devalued?

PrHike Directory

01/20/2013 08:52 pm

I really dont have that problem because I submit each new written page manually to be crawled and usually it gets indexed the same day. I don't worry to much about it because I know my content is fresh and not copied.

Gabe Garcia

03/12/2013 11:08 pm

Can anyone tell me why there is a difference in Sitemaps indexed pages and Index Status on Google WMT? I'm reviewing a site that shows 60,817 Total indexed on Index Status and 3559 Indexed on Sitemaps. Thanks!

Monique T (@NJLIMOUSINES)

04/11/2014 07:44 pm

Ding! Ding!

blog comments powered by Disqus