Top SEOs Analyze Glorified Scraper Sites After May Day

Jul 13, 2010 • 8:44 am | comments (9) by twitter Google+ | Filed Under Google Search Engine Optimization

WebmasterWorld's administrator, Tedster, posted a thread at WebmasterWorld that takes a deeper look at the May Day update by looking at sites that should be impacted by the update and were not.

Tedster does something you rarely see at a WebmasterWorld thread and picks apart a specific site that is doing well. Then you have some really well-known and respected SEOs come in and discuss why those sites are doing well in the May Day update and others are not.

He posted, in part:

When Mayday first dropped on us, there was a sudden INCREASE in rankings for mash-up sites. You can see examples of what I'm talking about at, and and Alexa shows their increases in traffic.

These are often sites with some substantial financing, and even relatively famous owners or CEOs. But to my view, they are a plague on the web and in no way offer the "better long tail results" Google was aiming for.

As one example - do a Google search for webmasterworld - I currently see 297 pages built from bits and pieces of our content. Try it for other domains and you often see much the same thing.

The goal, figure out why these sites are doing well in Google and replicate it so your scraper can do well also.

Here are some, not all, of the responses on what some top SEOs feel is working for these sites:

The site in question does do quite a bit of linking out to other sites that provide additional information within the mashed up content. Clicking the link goes to the site the content was ripped from. All the links are nofollowed so I wonder is this something we need to take another look at.

Adding more outgoing links to provide more information on the subject/product the page is about.

Daymix doesn't just scrape the Google serps pages and lift the titles and descriptions of the highest ranking/most relevant pages for a query, though, the way scrapers used to. It emulates Universal and scrapes the highest ranking/most relevant sources for different types of data that make up a Google serps page.

Daymix displays a mixture of web, news, blogs, images, videos, Twitter content, etc... and it's good enough, eg, to know when Twitter content might be appropriate for a query and when not; and what the most authoritative sites in a given field are to scrape. Apparently, the vocabulary and media mix is attractive to Google.

He then says this is similar to Google Place Pages.

I took a look at the daymmix site, and one thing I noticed is that when I looked at a result that specifically brought up my site, I have a script that displays the user agent of the visitor, and that the user agent is listed as googlebot. So I can confirm that they are scraping Google SERPS, or they are changing their robots names, and not obeying robots.txt.

Finally, Aaron Wall:

They fund a lot of the duplication...and they need to focus more on ways to promote / subsidize the cost of quality & encourage it. Minimizing the role of the scrape and mash game would be a big step in that direction.

But if end users don't know the difference (and don't understand the business connections) does it harm Google to make the media ecosystem weaker and more desperate for negotiations? I see it as the strategy of funding a third party to make a future partner weaker so you have more leverage at the bargaining table. But some might claim that is a cynical way of looking at it :D

This discussion and thread is just going to get better, so keep an eye on it.

Forum discussion at WebmasterWorld.

Update: I didn't realize that this thread was private. Now that I posted the quotes, removing them doesn't make sense (it is out there in the feeds already). Trust me, there is a ton more discussion in the thread. I only pulled out excerpts from the thread - so this is one reason to become a paid member of WebmasterWorld.

Previous story: Google Not Sending Referrer Data Again?



07/13/2010 02:24 pm

That's a private thread at WMW and you have to be a special member to see it. Not sure you should be posting quotes publicly. It may cause people to not be as frank.

Barry Schwartz

07/13/2010 02:31 pm

ILuvMahalo, I am sorry. I really didn't realize it was private. In the 6+ years of doing this, I really never slipped up so bad. I updated the post. I feel dumb.


07/13/2010 03:28 pm

Barry, you do an awesome job and are part of my daily read. Let me take this opportunity to thank you for all I have learned from your digging up nuggets in SEO forums everyday.


07/13/2010 03:36 pm

"Top SEOs Analyze Glorified Scrapper Sites After May Day" You mean "Scraper". A "scrapper" is a fighter, or a pugnacious individual. You are talking about scraper sites.

Barry Schwartz

07/13/2010 03:40 pm

Thanks guys! I guess losing <a href="">two hours of sleep</a> unexpectedly does have an impact... I really feel bad about this... Really.


07/13/2010 04:03 pm

Pay attention to what links pass link juice from daymix. The whole point of daymix is to create links for it's other blog websites. Go to . Notice the blog links? They own all of those blogs. That's this site's main purpose. The fact that it's gotten a spike in traffic is probably giving them a good chuckle.

Connor Bringas

07/13/2010 10:30 pm

Thanks for the post good link! Need another good read so post asap


07/14/2010 07:46 am

I also saw some long tail keyword ranking improvements in Google SERPs. I guess it's a good improvement. Google Places have missing reviews, it happened after their migration from Google LBL.


07/14/2010 09:48 am

Don't stress about it Barry!

blog comments powered by Disqus