Top SEOs Analyze Glorified Scraper Sites After May Day

Jul 13, 2010 • 8:44 am | comments (9) by twitter Google+ | Filed Under Google Search Engine Optimization

WebmasterWorld's administrator, Tedster, posted a thread at WebmasterWorld that takes a deeper look at the May Day update by looking at sites that should be impacted by the update and were not.

Tedster does something you rarely see at a WebmasterWorld thread and picks apart a specific site that is doing well. Then you have some really well-known and respected SEOs come in and discuss why those sites are doing well in the May Day update and others are not.

He posted, in part:

When Mayday first dropped on us, there was a sudden INCREASE in rankings for mash-up sites. You can see examples of what I'm talking about at, and and Alexa shows their increases in traffic.

These are often sites with some substantial financing, and even relatively famous owners or CEOs. But to my view, they are a plague on the web and in no way offer the "better long tail results" Google was aiming for.

As one example - do a Google search for webmasterworld - I currently see 297 pages built from bits and pieces of our content. Try it for other domains and you often see much the same thing.

The goal, figure out why these sites are doing well in Google and replicate it so your scraper can do well also.

Here are some, not all, of the responses on what some top SEOs feel is working for these sites:

The site in question does do quite a bit of linking out to other sites that provide additional information within the mashed up content. Clicking the link goes to the site the content was ripped from. All the links are nofollowed so I wonder is this something we need to take another look at.

Adding more outgoing links to provide more information on the subject/product the page is about.

Daymix doesn't just scrape the Google serps pages and lift the titles and descriptions of the highest ranking/most relevant pages for a query, though, the way scrapers used to. It emulates Universal and scrapes the highest ranking/most relevant sources for different types of data that make up a Google serps page.

Daymix displays a mixture of web, news, blogs, images, videos, Twitter content, etc... and it's good enough, eg, to know when Twitter content might be appropriate for a query and when not; and what the most authoritative sites in a given field are to scrape. Apparently, the vocabulary and media mix is attractive to Google.

He then says this is similar to Google Place Pages.

I took a look at the daymmix site, and one thing I noticed is that when I looked at a result that specifically brought up my site, I have a script that displays the user agent of the visitor, and that the user agent is listed as googlebot. So I can confirm that they are scraping Google SERPS, or they are changing their robots names, and not obeying robots.txt.

Finally, Aaron Wall:

They fund a lot of the duplication...and they need to focus more on ways to promote / subsidize the cost of quality & encourage it. Minimizing the role of the scrape and mash game would be a big step in that direction.

But if end users don't know the difference (and don't understand the business connections) does it harm Google to make the media ecosystem weaker and more desperate for negotiations? I see it as the strategy of funding a third party to make a future partner weaker so you have more leverage at the bargaining table. But some might claim that is a cynical way of looking at it :D

This discussion and thread is just going to get better, so keep an eye on it.

Forum discussion at WebmasterWorld.

Update: I didn't realize that this thread was private. Now that I posted the quotes, removing them doesn't make sense (it is out there in the feeds already). Trust me, there is a ton more discussion in the thread. I only pulled out excerpts from the thread - so this is one reason to become a paid member of WebmasterWorld.

Previous story: Google Not Sending Referrer Data Again?
blog comments powered by Disqus