When Google's Algorithms Don't Index Your Content

Jan 14, 2013 • 9:08 am | comments (24) by twitter Google+ | Filed Under Google Search Engine Optimization

Google Index AlgorithmA Google Webmaster Help thread has one webmaster upset that his index count, compared to the number of pages submitted via his sitemap file is continuously going down, as opposed to up.

One of the best ways to see how many pages Google has indexed of your web site is to upload an XML Sitemap file and compare the URLs submitted to the URLs indexed. If that count is close, that is a good thing. If the number of URLs indexed continues to go up, that is a good thing. If that number continuously goes down, there is likely a problem.

So this webmaster wanted to know what the problem was and Gary Illyes from Google explained that Google's algorithms don't want to index many of the pages. He wrote:

As we improve our algorithms, they may decide to not reindex pages that are likely to be not useful for the users. I took a look on the pages that were once indexed but currently aren't and it appears there are quite a few that have no real content.

He showed examples of pages that are soft 404s (it says page not found but returns a 200 status code in the http header). He also showed examples of blank pages being index. Also, he showed examples of URLs in the Sitemap that is referencing URLs that are not canonical.

Healthy sites, need health URLs, content, redirects and proper http header responses. Otherwise, Google may stop indexing and worse, crawling the URLs.

Forum discussion at Google Webmaster Help.

Previous story: Google AdSense Testing No Arrows Now?
blog comments powered by Disqus