Google: Empty Pages Are Duplicate Content Issues

Jan 19, 2011 • 8:33 am | comments (10) by twitter Google+ | Filed Under Google Search Engine Optimization
 

I am sure you have run across a web page that returns nothingness in the past. Yes, a page with no content, no navigation, just a blank white page. This may be the result of a server issue, HTML issue or a security prevention method gone bad.

A Google Webmaster Help has one such webmaster who had this issue and it led to serious Google ranking consequences.

The issue is, I am a bit surprised by Google's JohnMu response. Instead of saying that because Google could not determine the content of the page, they decided to remove it. He said, Google sees multiple blank/empty HTML pages on the site and thus considers them to be duplicate to each other.

John said:

It looks like some of the pages from your site have been returning empty HTML pages with almost no content. In general, when this happens, we may assume that you're trying to return this content on purpose, and if it's identical with other content that we've found on your site, we might assume that they're duplicates. If you are not aware of this issue, it may make sense to check with your web-hosters to find out more -- perhaps it's a part of a "security" mechanism?

I would think Google wouldn't consider them duplicate, but rather throw them into a bucket of error pages and recrawl them on a schedule.

Maybe I am misunderstanding. But I felt the difference between an bad error page and a duplicate content issue with a blank HTML page was an interesting difference for SEOs to chew on.

Forum discussion at Google Webmaster Help.

Previous story: AdSense Category Filtering: Now You See It & Now You Don't
 

Comments:

susanbain

01/19/2011 01:56 pm

Good to know. It makes sense in cases where I have seen this after major site redesigns. When testing redirects, I find I have to manually review if a page actually displays content and not just rely on the status code returned from an automated tool. The red flag for these for me know during testing redirects is when I see a status code of "200". When I see that, I manually test the redirect and 9 times out of 10 see the blank page you are referring too. So- at least in this case I see why google would not see it as an error page since we are returning a good status code vs a 404. However, to treat as duplicate content seems odd. I would love to hear what others are doing to test and find these kind of pages. For larger eCommerce sites like the ones I work with, its sometimes a needle in a haystack to identify them.

Mark - worlds largest plr sell

01/19/2011 04:31 pm

No content pages as duplicate content! lol! That's really ridiculous! I think this has once again proved that bots are just bots and they can't judge like humans! Anyways, this is something to be noted by webmasters whose website's have empty HTML pages.

CARGUY!@11

01/19/2011 04:41 pm

Google, cant live with them, cant live without them. I am loving this website, learning alot and I really am hoping I can get to the point where I can build my site up NJ Chevrolet

Kevin Spence

01/19/2011 05:01 pm

I don't think I'd say this is a duplicate content 'issue' -- unless you're trying to rank for a bunch of spaces.

Michael Martinez

01/19/2011 05:28 pm

I always embed a robots meta tag with "noindex,noarchive" directives on intentionally blank pages. I never saw the point in allowing such content to be crawled. If such pages are being produced by accident, one would hope they are discovered and fixed.

Anne

01/19/2011 06:12 pm

At times the web pages are built and pushed online till the content is ready. Rather than leaving the pages blank, either add a Noindex tag or just mention Coming Soon - only if the pages are going to be ready in a short span.

Nazhiker

01/19/2011 11:20 pm

it's not hard to scrape a blank page so I'm glad they are on top of that.

Tdionisiou SEO

03/16/2011 11:05 pm

Yes that's correct. We should avoid publishing stubs. Bots and especially visitors don't like seeing "empty" content pages. If we can't avoid the creation of place-holder pages we always must use the noindex meta tag to block these pages from being indexed by googlebot. -- Great post author and thanks for sharing it with us

sunset canvas

03/31/2011 03:18 pm

What a bizarre response indeed, this is an example of google misinformation??

adfsfaa

04/06/2012 06:38 am

waste

blog comments powered by Disqus