How Google Handles Duplicate Content Scrapers

Jun 11, 2008 - 9:56 am 0 by

A Google Webmaster Central blog post by Sven of the Search Quality Team addresses underlying concerns regarding how Google handles scraped content. In the two scenarios discussed, one (duplicate content within your domain) can be controlled. The other cannot. However, he recommends that you offer a link back for syndicated content:

In cases when you are syndicating your content but also want to make sure your site is identified as the original source, it's useful to ask your syndication partners to include a link back to your original content

If scraped content ranks higher than the original content, it is probably a technical issue on your end (a "rare case," says Sven). You should check that the content is not blocked out by robots.txt, see the sitemap file for any changes, or check if the site is in line with the Google Webmaster Guidelines.

The article concludes with some hopefully reassuring text:

To conclude, I'd like to point out that in the majority of cases, having duplicate content does not have negative effects on your site's presence in the Google index. It simply gets filtered out.

WebmasterWorld members are not sure if the article is entirely accurate. One issue of contention is the statement that scraped content ranking higher than original content is a "rare case" (and that it's really a technical issue that you'd be responsible for). Personally, I am in agreement with the forum members here as I've seen (and read about) the same thing happen(ing) many times.

Tedster makes an interesting observation: Google might get it right 99% of the time, but that 1% that they don't get right will still bug you.

Another forum member says that sometimes scraped content is masked and it's hard for Google to figure out what is the originating site:

Our biggest problem was (is) with scrapers that take our content and embed it on their site wrapped around their navigation, etc. They make a very good effort to masquerade as a legitimate site, which makes it very difficult for Google.

Overall, webmasters agree that it's not that easy to prevent content scrapers. Content scrapers are getting smarter and will try to avoid being detected by the big G.

Forum discussion continues at WebmasterWorld.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Google March Core Update Done, HCU Recoveries, Site Reputation Abuse & AI Topics - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: May 3, 2024

May 3, 2024 - 4:00 pm
Search News

The Industry Mourns The Loss Of Mark Irvine - Brilliant & Giving Search Marketer

May 3, 2024 - 1:10 pm
Search Video Recaps

Search News Buzz Video Recap: Google March Core Update Done, HCU Recoveries, Site Reputation Abuse & AI Topics

May 3, 2024 - 8:01 am
Google

Google SGE AI Answers Now Cost 80% Less To Generate

May 3, 2024 - 7:51 am
Google Search Engine Optimization

Google Open To Alternative Ideas For Hreflang

May 3, 2024 - 7:41 am
Google Search Engine Optimization

Google Recrawls URLs At Different Rates: Multiple Times Per Day To Every Few Months

May 3, 2024 - 7:31 am
Previous Story: Google Trends Features New Numeric Scales and CSV Downloads