Google: Duplicate Content Pollutes 25-30% Of The Web

Dec 17, 2013 • 8:44 am | comments (5) by twitter Google+ | Filed Under Google Search Engine Optimization

Duplicate Content & SEOWe all know Google's take on duplicate content, it is not spammy, it is not something they have penalties for unless you are being completely evil. We also know that if you have exactly the same content as someone else, it might be hard for your content to rank in Google because Google shows only one of the duplicative pages.

The new news is that Matt Cutts of Google said that somewhere between 25% to 30% of the web is duplicative. Meaning over one-quarter of the pages, content, files, images, etc, on the web are replicated. Can you imagine?

It might seem like a lot to a normal user. Probably doesn't seem like a lot to you and me because as soon as I publish this story, 15 other sites will scrape it and post it on their sites. So this one piece of content will be duplicated 15X or so.

But as a user, because search engines cluster and show one of the duplicative pages, we don't realize that so much of the web is duplicative.

Cool, no? Depends if your site is the one that makes it to the top.

Forum discussion at Twitter and Google+.

Previous story: Google On IE11 Displays Left Aligned


Michael Martinez

12/17/2013 03:08 pm

I think this video is long overdue. People remain very concerned and sometimes overly alarmed about duplicate content. One of the most common questions we have to deal with at Reflective Dynamics is something along the lines of "but won't that be duplicate content?" Yes, sometimes you create duplicate content, but more often than not that just forces the search engine to make a choice about what to display; it doesn't necessarily mean you'll be penalized or downgraded. That said, many of the sites that were downgraded by Panda did indeed have broad and extensive duplicate content problems and those issues MAY have been a factor in the downgrades.


12/17/2013 03:10 pm

You're right. Google is the best.

Marco Angelucci

12/17/2013 11:32 pm

hi Barry! that's why this is the first video you post, that I haven't shared on my blog... Funny, but I was sure to have seen it on Search Engine Land, Search Engine Watch and some other f*****s that I love! Keep going on, M.


12/18/2013 12:21 am

It's interesting that he would even put a number on it. I wonder how accurate that really is?

Durant Imboden

12/18/2013 03:43 am

The 25-30% estimate sounds reasonable to me. Pages for products or services on e-commerce sites often use boilerplate text, and newspaper sites often run the same wire-service stories that their competitors do. Even without scrapers and spammers, there would be a lot of duplicate content on the Web.

blog comments powered by Disqus