Google: Aggregated Statistics Is Not Great Content

Jul 12, 2012 • 9:15 am | comments (10) by twitter Google+ | Filed Under Google Search Engine Optimization

Google StatsI've been following a Google Webmaster Help thread about a site that offers statistics on hockey related content and players. In fact, the site is from 1998 and is supposedly really respected in the industry.

The webmaster said April 24th, when the Penguin algorithm was released, his site tanked.

He is not sure why and is arguing that he has quality content on his site but it is in statistical form. There is little paragraphed article related content, instead, it is database related statistics.

He said, for his user, the content is exactly what they want. But to Google, Google seems to want more article related content. He doesn't like that fact.

Google's John Mueller basically agreed and said he probably needs more than just stats on the page. John said:

Thanks for posting all of these details. Looking at your site, I don't see any specific technical issues, or general issues with the links to your site. I can, however, imagine that our algorithms might have some trouble understanding the unique value of your website in comparison to other, similar sites (especially considering that the content is primarily aggregated statistics). My general recommendation would be to continue working on your website, making it the best site of its kind. There's no single change that you'd need to make, so I'd really look at your site overall and see where you could make improvements on a general level -- you mentioned that you might have some thin pages, perhaps that's a place to start (or at least, to try things out with A/B tests, etc).

It doesn't seem to me that this site is a typical Penguin victim but maybe lots of pages with lots of numbers triggered a Penguin related issue? I am not sure.

But those out there with stats related web pages, make sure you have more content than just numbers on the page.

This is kind of ironic being that Google is such a number centric company.

Forum discussion at Google Webmaster Help.

Image credit to BigStockPhoto for stats head.

Mike Kalil

07/12/2012 02:28 pm

Maybe if they just hire someone to write a few paragraphs about each player and team, the issue would be sovled. Or perhaps even auto-generated content would do the trick, since all the stats are unique on each page. That wouldn't be too hard to accomplish.


07/12/2012 03:10 pm

Barry, I would not recommend adding rich text to the pages. Here's why, based on my experience. Doing such dilutes the data (in the majority of cases) thus the site loses footing in other SE's without seeing any notable recovery in Google Search. Other experiences may vary. Would like to hear from those who succeeded in this area.

Lyndon NA

07/12/2012 03:18 pm

That's been a major flaw with Google for years - and one that I used to kick up over no end. On the one hand, G says don't write for robots, then kick you to the curb when you don't provide content specifically to pass muster with the darn GoogleBot. The only thing they can realistically do is not attempt to rank for the direct stats. They need 2 sets of content: 1) The Stats 2) The writeups The link from (2) to (1). They may want to consider setting the stats as NoIndex, or slapping it into a SubDomain etc. But they are in for a rough ride for the next few Months (min.) whilst G re-evalutes the site, and decides it's not only got valid content - but enough time has passed for G to see it in a favourable (trusted) light again.

Scott McKirahan

07/12/2012 04:48 pm

No, that wouldn't seem to be a Penguin related problem; it is far more likely to be Panda.

Ralph Slate

07/13/2012 05:59 am

Barry - Ralph Slate here, founder of I'd like to elaborate on some points brought up. First, I would also agree that this sounds more like Panda than Penguin, but the dropoff was precisely on April 24, which points to Penguin. That is what is so puzzling. You can see my graph in the original forum post, which you have linked to, and the steep dropoff. I've been doing some in-depth server log analysis to see what was impacted but it's hard for a number of reasons: - I literally have several hundred thousand pages - a profile page for 150,000+ players, plus a roster/scoring page for each team/season of hockey, plus pages for standings, team/league history, logos, trading cards, drafts, awards, etc. The advantage of a database is that it is not canonical, the data can be assembled many ways. - Google traffic tends to follow the news cycle, so a player may leap into the spotlight and be gone a day later. I had 137 referrals for Alex Leavett on 4/23. On 4/27 I had just 1 - but when I searched today, I'm still on page #1. His 15 minutes simply peaked. - Google sends different results to the US versus Canada. Search for Jimmy Hayes on and I'm the #1 spot. Search on and I'm #11. - The first round of the NHL playoffs ended on April 26, and no Canadian teams made it to round #2 (the dropoff was too steep and permanent for this to account for the big loss though). - My site is incredibly long-tail. On April 23, Google referred users to 9,401 different player profiles. 5,558 (59%) of those profiles were single-profile referrals (meaning the player was referred just once by Google). On April 27, the number of single-referrals was 57% - but the total number of different player referrals was just 6,142, a 35% drop. I think the long tail is the area I'm losing out on and "thin content" is the most reasonable explanation, but again, that's Panda, not Penguin. But I think that the "basket" approach might have gone in with Penguin - Matt Cutts alluded to it by saying "Google would seek to detect that there is no real differentiation between these results [an example of frogs] and show only one of them so we could offer users different types of sites in the other search results." This went beyond duplicate content, it is as if Google has broadly classified sites. John Mueller alluded to this too by calling my site "primarily aggregated statistics". While that term describes my site in a broad sense, that description really sells my site short. Another thing I am seeing quite a bit of is my result pinned as the #11 spot - the first result on page #2. This seems too freuqent to be coincidental, it indicates a possible algorithmic penalty to me, as if Google is saying "this page should rank on the first page, but don't put it there". That seems like an add-on penalty too, which would imply Penguin. Search for [John Weedon hockey] and you'll see how odd the results are - the top pages are from people-finder sites that scraped their information from my site. My site is #11. While I agree that a few paragraphs about each player would make my site better, that just isn't really practical - I have 150,000 players in the database, many of which played several decades ago. I don't allow user-generated content because I am an authority site and I have learned that people will lie about their history if given the chance. I am a niche site that focuses on comprehensive statistical history. I am not trying to chase the competetive hockey players - there have been only about 7,000 players to play in the NHL, and those are the players every other site has focused on. I want to be just as focused on the other 143,000 players because those are the players nobody knows about. The one big change I have made is to add more "related information" to the player profiles, to beef them up somewhat and to offer some shortcut links to other related information pages. The side effect is that there will be more words on the page (instead of just numbers) We'll see if that has any impact.


07/13/2012 07:17 am

google not want to visitor will able to want himself. Google want only what webspam team want (and allow) for him and limit what him can want to see in the web. It work of precious smart god-made google algorithms and even matt cutts cannot change it, because not programmer but blablabla politician.


07/13/2012 07:24 am

sorry to say, but looks like your site is not authority site anymore. Google have own idealistic ideas what is better for peoples. Content is not matter here, just some unknown signals.


07/13/2012 11:44 am

this is just not fair. such a great resource is being tanked for unknown reason. I also feel that John Muller feels sorry for collateral damage this site obviously is, because there is no obvious reason for penalty. I never bought "thin content" as a signal, because if that mattered we wouldn't have a any "single page" spammers on the first page of SERPs like we have today. There is a lot of collateral damage with every update that SEO people try to explain although there is no explanation really...


07/13/2012 02:34 pm

Wow...just wow. A really unfair situation. As to 'thin content', if that was one of Google's hot buttons, then why are there so many search results that contain three sentences or so of scribble that rank at the top of searches?


07/14/2012 05:05 am

Google is making mess with penguin & panda,they are spoiling their own source by themselves.....

