Google Can't Find The Original Source Of Content?

Aug 11, 2010 • 8:39 am | comments (9) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Since the MayDay update there have been a spike in complaints about site's with stolen content outranking the sites they stole the content from.

This is an old issue, which Google was typically not too bad at handling. You have 5 web sites, one web site wrote the content first, then the other four snatched the content from the original source. Google was typically good at knowing who was the original source and ranking that original source higher, independent of which site had more PageRank and authority.

Since MayDay, it appears that original source detection has gone a bit haywire.

There is a large WebmasterWorldthread and a new DigitalPoint Forums thread with constant complaints about Google ranking scraper sites above the original source.

Is it Caffeine or MayDay related? Is it that Caffeine is discovering content faster on the scraper site, thus giving them the original source credit (if it works that way)? Or is it a ranking algorithm change with MayDay, finding original source credit less valuable then other criteria?

Tedster's theory:

It comes from Mayday giving good rankings to "sites" they feel are more popular - and therefore better over all destinations for the search user. The emphasis used to be more on the "page" rather than the "site".

Have you found this to be a larger issue since MayDay/Caffeine?

Forum discussion at WebmasterWorldthread and a new DigitalPoint Forums.

Previous story: Google SSL Search Not Working In Safari 5
 

Comments:

Mike

08/11/2010 02:09 pm

"The emphasis used to be more on the "page" rather than the "site"." I thought this was the other way around, where large sites could rank based on their authority, and after mayday Google seems to be treating sites more as individual pages by focusing more on off-site metrics of individual pages as a stronger indicator of value.

No Name

08/11/2010 02:45 pm

We have a scraper in our niche, stealing content from many sites including mine. He got his adsense banned for that. Since caffeine/mayday, that site has been doing very well despite its adsense being terminated for stolen content. Its making us shout "may day!" tp Google ourselves.

Amanda

08/11/2010 02:52 pm

I have seen a lot of this lately. I went to G blog search and searched for "You have 5 web sites, one web site wrote the content first, then the other four snatched" and this article did not come up but this one did... http://www.jatinmahindra.com/2010/08/11/google-cant-find-the-original-source-of-content/ I guess they are more of an authority than this site. NOT!

Brian

08/11/2010 03:49 pm

googles results are becoming worse by the day...

Rob Abdul

08/11/2010 03:58 pm

What a total nightmare!

Jacob

08/11/2010 06:31 pm

If someone steals your content, you can file a DMCA with google adsense (google cannot profit off stolen content), the hosting provider and google themselves. Google can remove copyrighted content from the search results. They will replace any searches with your letter. These guys are scraping all of your content plus related sites, they have your content and plenty of similar. The scraper sites have lots of content in a narrow niche as they can get all the content they want for free.

Joshua Dorkin

08/11/2010 09:50 pm

In a post from a while back, Matt Cutts said to ignore the scraper sites, as their links back to your original article are essentially good for your site. Seems logical, but I'm just not a fan of other sites using my content as their own . . . what would inflame me would be if any of them outranked me. If Google is not helpful, sending a DCMA takedown request to the host of the scraper is usually very effective.

Jon

08/12/2010 08:53 am

I've noticed this also. One general culprit that most of you will have been affected by is Domaintools.com that scrape content and can easily outrank many of your pages due to their strong PR. To prevent this you can add this to your robots.txt: User-agent: SurveyBot Disallow: / which in time should filter out their duplicate content. I don't know how often they update their index so it couldbe some time before this takes effect.

jim

08/12/2010 01:36 pm

I truly hate new google search results. I can never seem to get exactly what Im looking for anymore. I need to dig deep and do extra work to find what I want. Why the hell would you change it...if it aint broke dont fix it. Oh frustrated

blog comments powered by Disqus