Panda: What Happened To Fixing The Scraper Issue?

Jan 25, 2013 • 8:51 am | comments (41) by twitter Google+ | Filed Under Google Search Engine
 

pandaWasn't the whole point of the scraper algorithm and the continued Panda updates about making sure quality original sourced content ranks in Google and now stolen/scraped content?

It seems like recently Google is back to where they started from prior to February 2011 (maybe I am being a bit harsh) when it comes to scrapers outranking the original source.

A Threadwatch thread by Aaron Wall cites how a story that cost The Verge $5,000 to put together is not ranking in the first position for the story name in either Google web search or Google News.

The article is over here but when you search for [death of the american arcade] in Google, the number one web result is tomwoods.com and the only Google News listing is from the king of content generation, Huffington Post. What is going on here?

I actually asked Google about this a week or two ago, still no response, for a story on Search Engine Land. Search for an article Danny just wrote a few days ago named Google Launches Streamlined Image Search. Google doesn't rank it anywhere on the first page of the web results for a search on the title. It does happen to rank in Google News.

Google Scrapers Ranking

Wasn't Panda and the Scraper update suppose to fix this?

As Aaron Wall asked in the thread:

How long will publishers be able to afford to rank 3rd for their $5,000 articles, when Google keeps pumping up the rank of the $20 rehashes?

I should note, technically these sites are not "scrapers," but the original source should rank number one - not the sites referencing the original source.

Forum discussion at Threadwatch.

Image credit to BigStockPhoto for Panda

Previous story: Community Manager Appreciation Day On Monday
 

Comments:

Michael Merritt

01/25/2013 02:33 pm

I get what you're saying about The Verge article, but why would Google rank Danny's article both in News and the first web result? That'd be unnecessary duplication. I could see the point if Danny's article wasn't showing up as a News article, though.

Barry Schwartz

01/25/2013 02:34 pm

Different indexes.

Vermont Design Works

01/25/2013 03:22 pm

Certainly doesn't explain everything, but in the case of the streamlined image search example, could it be that Google is now doing with news results what I've increasingly seen them do with local - pull you out of organics if you show in the "local pack" (in this case, if you show in "news"), to avoid giving anyone 2 pieces of real estate on page 1 of SERPs?

Jaan Kanellis

01/25/2013 05:17 pm

Would love to hear from Google on this. I see it happen all of the time to our content. A weaker back link profile is probably to blame.

josh bachynski

01/25/2013 05:18 pm

NOT in google's opinion - when you hear John Mueller talk about "relevancy" he basically says that relevancy is (also, apparently mostly) a measure of user appreciation (metrics) and this is reflected by their algorithms, which also apparently can't be wrong as they are a reflection of said user appreciation. Now there might be something else wrong here - perhaps these main sources have some technical issue preventing them from being maximally crawled, indexed, and ranked. Or maybe Google is happy with serving the first crawled example that is sufficiently appreciated by their users.

virginia

01/25/2013 05:32 pm

yes its a shame, was hoping Google was cleaning this up but it will take time I guess

Alan

01/25/2013 06:15 pm

What is laughable is that some random G+ account ranks better than search engine land. Another funny circumstance is that search engine lands tumblr account is ranking on the front page. Barry I don't think you went back far enough.. I think we are back in 2010 territory.

Alan

01/25/2013 06:31 pm

ok but why would it happen to searchengineland.com their backlink profile is awesome.

Michael Martinez

01/25/2013 06:50 pm

It's not really a scraper or "low quality content" issue so much as it appears to be a FRESHNESS + PAGE TITLE issue.

Rank Watch

01/25/2013 06:58 pm

Well noticed Barry. Can't think why a G+ post with just two +1's ranking second. If Google really wanted to rank a G+ post for this keyword it could have been a G+ post of SEL for this particular blog which is very very authoritative as it would has more +1's and user interactions.

David Castro

01/25/2013 09:23 pm

Different indexes, that's true. But I rather see only 1 result at a time for the same article (and source), I don't care if it's from G news or an organic result. What happened to The Verge article is ridiculous, Google doesn't care about the original author, and in this case what users think about it, the original article has 10K FB likes while Huff's dupe has only 78 likes...

rankingbyseo.com

01/25/2013 09:23 pm

did you guys also noticed that EMDs are ranking better even if these are new to industry than other older and authority sites sites. And, you can see this with any keyword of your choice.

David Castro

01/25/2013 09:31 pm

Their article is coming up first, Google chose to show the news result instead of the organic one.

Aaron Alexander

01/25/2013 09:57 pm

Google Does Care Gals and Guys....About the Sense that Dollars make! When hopeful affiliate noobs get a hold of this info, they'll be spinning EMD's and scrapin content w Fiverr likes... till a Pandguin army is needed : [

Kevin

01/25/2013 09:58 pm

I see this as a #1 ranking for Search Engine Land. They are serving it up as news due to it's recency and to drive traffic to their news section and cross market it. Huff post is referencing their own page as canonical which can be argued since they are only stealing the title tag now. It looked like the link was followed to me but maybe someone can take a look. As for the ranking for that search phrase, the exact match title for the original source was "for amusement only: the life and death of the american arcade" as opposed to huff posts title "life and death of the american arcade" which is a closer match. "for amusement only the life" returns TheVerge at the top and does not return Huff post. I think we can all relax and enjoy our weekends. The search phrase was not an exact match to the original contents title.

John Doherty

01/25/2013 10:02 pm

What does that even mean? How can it be a page title issue when the page titles are the same and time published is within a few hours of each other?

kevin

01/25/2013 10:04 pm

I guess they are not even stealing the title tag in this case either, only abbreviating it. That may have been misleading.

Kevin Fleming

01/25/2013 10:42 pm

One thing I did notice here: We can infer that there may be an advantage to taking the added text (Brand name) out of your subpage titles to make them more targeted. G can tell what your brand is from your /index.html title and the added duplicate content takes away from the % of the title text that is relevant to the targeted search query.

Mark Traphagen

01/25/2013 11:32 pm

This happens within Google+ itself. A reshared post by my profile is often able to outrank the original post (and often be the only one showing) in Google search for a search of the OP's first several words (which Google appears to treat like a title tag). I posted about this at http://www.virante.org/blog/2013/01/10/google-plus-post-rank-hijacking/

sestuff

01/26/2013 12:41 am

Got your point on FB likes and in a perfect world this would work, but Google has said that they don't use social signals all that much. Don't really blame Google on that one especially since people can buy social likes and there are many barriers with them.

MonopolizedSearch

01/26/2013 12:51 am

The good news is that Bing has this one right. It's terrible that Google is so concerned about monetizing our work that they have lost all concern for the publishers. Dollar signs must be blinding Google's interpretation of what is right and what is wrong. Or is it that Google simply does not care much about organic search anymore?

Takeshi Young

01/26/2013 12:55 am

This is especially puzzling since all the "scrapers" link back to the original source. Google should easily be able to follow the links and determine the canonical version of the article.

seo services - go4seoindia

01/26/2013 05:08 am

Really its all mixing.nobody can tell 100% what is going on.But i hope google do all that is good for buyers.

John

01/26/2013 09:05 am

Google says No Medical Degree No Ranking for Health Site

Nicolas Andrews

01/26/2013 10:52 am

thats unfair...

Dr. Rajesh Moganti

01/26/2013 11:34 am

No medial degree means ? Without g plus association google won't rank health sites ?

newyorker_1

01/26/2013 12:39 pm

Scrapers are dominating all the way, Google has lost this battle long time ago.

Matt Morgan

01/26/2013 02:53 pm

It's like Google's hard drive crashed and their backup service has been turned off for 2 years :)

James

01/26/2013 05:40 pm

The bad news is that most people don't use Bing.

MonopolizedSearch

01/26/2013 05:45 pm

I'm sure that you knew Google does not favor its own products in organic search? :) Too bad the FTC dropped the ball and burden on millions of businesses that rely on diversity and fairness in search when they let Google off the hook. We really need something to change so places like Huffington rePost can't benefit from our hard work. Google is culpable in allowing the monetization of our stolen work by elevating stolen copies/excerpts in the SERPS. It's clear that content is not king and domain authority is what matters most to Google. Google needs to turn the dial down on domain authority and let content rank for its own value. It's BS and borders criminal when Google aids and assists those that have stolen our work/excerpts. Shame not only falls into the lap of Huffington rePost and Google, but also the FTC for being persuaded by Google's lobbying might. Our economy and future innovations will surely suffer when authors are shunned by a search engine that favors certain domains/brands.

Alan

01/26/2013 06:44 pm

Totally Agree. Google used other people content to build a great business. We were happy because Google would send traffic through to us content creators. more and more Google is holding onto that traffic itself. The gentlemen's agreement has been well and truly breached.

Jaan Kanellis

01/26/2013 09:55 pm

Also the back link profiles of the sites also ranking are incredibly strong, but yes I agree with David.

Ken Fobert

01/27/2013 03:47 am

lol, I was just as confused

Gregory Smith

01/27/2013 04:02 am

It's much more an issue for page titles and freshness. I couldn't agree more!

Roie S

01/27/2013 09:19 am

Bing actually does a better job than Google

Seo Hop

01/27/2013 03:27 pm

it definitely should barry. The rule of thumb is supposed to be even if you publish even 1 second before the other website your site should show up first. Furthermore each page has a time stamp and I'm fairly sure google can read this. So they definitely know which one is the original

Kelli Brown

01/28/2013 09:38 am

Interesting follow up on Poynter.org - once again, the entire discussion misses the role search engines play. http://www.poynter.org/latest-news/mediawire/201546/huffpost-trims-aggregated-post-after-the-verge-complains/

Govind Singh

01/28/2013 10:54 am

agreed i seen most of crappy content site rank very well and good website get penalize in Google.

CarAutoDriver

01/28/2013 11:33 am

I have seem scraped articles rankings on the top many times. Honestly these articles do not make sense at all most of the times.

Guest

01/29/2013 07:22 am

my last comment isn't showing. trying to put my words again with the same emotions. with the last unconfirmed update on 17 Jan, the search results are even worse than that of year 2010. sites like tumblr, linkedin, facebook, blogspot, digsitevalue, sub-domains, EMDs, sites without www, sites with no content, coming soon, server error, brand new business's sites were pushed up and sites that were ranking in top for years and years unhurt by any of google panda, penguin update were pushed down. motive of google is to push down sites that were ranking for decades is to force them to use adwords to remain in the business. motive of google to push up sites that were not ranking anywhere is to give them the free business for year or so and them drop them suddenly. in the end they have the only option left opt for adwords or opt out from business

Ben Guest

01/30/2013 02:34 pm

Are we really going to believe in a stagnant or static search engine? Indexing indexes which means sometimes you have to wait for it to sort itself out. It doesn't take much to understand Google's freshness algo. It's probably the easiest to manipulate. IMHO anyway.

blog comments powered by Disqus