The Life Time Value of Links Based on Google Webmaster Central

Apr 27, 2007 • 2:14 pm | comments (7) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Ever since Google expanded Webmaster Central to include a link analysis tool, we have been collecting the raw data to analyze for later purposes. Last month, we saw some February and March link data. This month, let's look at April and put it up against February and March.

This month, only three of the top ten most-linked to pages made Digg's front page:

Tamar did an excellent job organizing the data for me to look at and analyze a bit more. Here is our most recent linkage data from Google Webmaster Central's link tool. One thing stands out about this data is that the first article has 15,426 more links than the second most link to article - that is huge. Anyway, here are the most linked to article, based on April's linkage data.

April 2007 Linkage Data Link #
75% of Google's Blogspot Blogs are Spam 18,368
Roundtable Coverage of the Search Engine Strategies New York 2007 Show 2,942
Yahoo! Removes Category (Directory) Links From Under Search Results 1,648
First Screenshots of Google Pay Per Action in Action 1,318
Listing of Some Free Keyword Suggestion Tools 1,309
Google Wins in Kinderstart Lawsuit 1,210
Seeing Google Pay Per Action in Action 1,207
Microsoft Live Search Link Command Operator Offline 717
Google Sending Out More Google Coolers to AdWords Advertisers 698
Can You Place Google AdSense Ads on 404 Pages & Thank You Pages? 659

To remind you, here is the March linkage data that we posted in the past. There may look like some duplicates are posted, but trust me, this is how the data was exported. So even though, we have two articles listed separately, that is how we got it from Google in the CSV. Why? I do not know.

March 2007 Linkage Data Link #
Google Sending Out More Google Coolers To AdWords Advertisers 1,961
Listing of Some Free Keyword Suggestion Tools 1,391
Google Allows AdSense Publishers to Click Play Button on Video Ads 1,073
A Conversation With Google CEO Eric Schmidt 966
AJAX & Search Engine Optimization (SEO) 938
Google Allows AdSense Publishers to Click Play Button on Video Ads 878
Google AdSense Overview Page Goes Blank For 30 Minutes 779
How Does The Yahoo! Directory Rank Sites? 752
Google Maps Sends Health Emergencies To Wrong Location 752
Search Pulse 20... 633

Here is the February data. You will see the top nine, because that is what we had at the time. Also, Google themselves told us that data was not 100% complete yet, so it may not be wise to judge this data in our analysis.

February 2007 Linkage Data Link #
A Conversation With Google CEO Eric Schmidt 844
Screen Shot Of Quality Score Metric in AdWords Console 736
Microsoft Banning Sites from Live.com For Link Exchanges 516
Vanessa & Adam Working on Christmas Day 514
Google AdSense Competitive Ad Filter Not Working? 503
Making Bidding Mistakes at Google AdWords ($0.10 Vs. $10.00) 479
Dynamic Keyword Insertion in Your URLs With Google AdWords 472
Programming Note: Vacation Until 1/17 433
Google Toolbar PageRank Update Being Reported 426

The first thing you can look at when you plot this on a chart, is that the as articles get older, they have less links (most of the time). It is understandable that new articles one rights, will attract a lot of fresh links. However, in many of the examples we listed above, the links have dropped more than half.

A quick line chart of the articles that had the top amount of links in February show the downward trend of links, as the article ages.

Picture 1

Why is this? Why do old articles have less links over time? Here are some possible ideas:

  • Pages with the links are deleted
  • Pages with the links are password protected
  • Pages with the links are moved into the supplemental index
  • Pages with the links are temporarily offline

But honestly, this did not satisfy me. First, does Google not count links that are from within the supplemental index? I was thinking about testing this but the problem is, we do not know how recent this data is. Google pushed us our linkage data, based on an archive copy of their data. A page can be in the supplemental index at any point, so it is hard to say, without a shadow of a doubt that a page was in the supplemental index at point A. We don't know when the links were calculated to come up with that evidence.

One weird observation was that the majority of the most linked to pages for the months above fell out on or around the same date. Five of the top nine articles on February's data was on the 25th or the 26th of December 2006. Six of the top ten articles from the March data set fell out on February 20th and 21st of 2007. And a majority of the April most linked to articles fell out on or around March 26th.

This seems to add more evidence that Google finds new links quickly and then doesn't count those links as time goes on. Maybe Google weeds out links after a certain amount of time? Maybe those links are dropped for the reasons listed above or maybe there is a new filter in place? I do not know. The data tells me that of the top articles from all the months export files, there is a 34.04% latency in the links over 1 month and a 17.67% latency over two months. But there are some outliers tend to skew those results.

One of the patent applications we linked to today is on Document Scoring Based on Link-Based Criteria that Matt Cutts helped to write. In that document, we see that it is possible Google may be looking at and filtering or scoring criteria such as:

  • Rate of the disappearance of links to a document
  • The time that takes
  • The rate of links relative to the freshness of the document
  • and so on... (I am not going to list them all out here

Is the list of links we see in these webmaster reports provided by Google valuable links? We know nofollowed links show up in the tool. So maybe there is zero filtering going on for the newer links? Maybe not. I just don't know.

I was hoping to learn more about how Google looks at links through these reports but I do not believe I have. Maybe I need to collect more data over time? But I think the main issue is that I do not have the time stamps of when the data was collected by Google. So it is incredibly hard to analyze this data. A month by month snapshot is great but not enough for us to derive any significant conclusions from.

Previous story: Google Toolbar Showing Buggy PageRank Data?
 

Comments:

SEO Mash

04/27/2007 07:44 pm

Is the decrease in number of links counted by Google really that nefarious? In general I am sure that a lot of your links come from blogs. When a topic is new it will be linked to from the main page of the blog of course, but also from the most recent posts page, the hot posts page, etc. As time passes though you are no longer on the recent or hot top pages obviously. Also you the original blog post itself which links to you may well be on page 34 of the blog at that point which may not have enough link juice to be anywhere other than the supplemental index.

Barry Schwartz

04/27/2007 08:12 pm

Agreed. But do we know Google doesn't count links from sites in the supplemental index? Also, do pages jump to the supplemental index that quickly?

cybernezumi

04/27/2007 10:02 pm

Similarly, one effect I've seen is in people's digg historys (and if you've gotten a front page story, you have a lot of link tool pages that are these pages). As they digg more stuff you can quite rapidly move from history page 1 to page 10 -- and google probably hasn't gotten around to reindexing page 10 again so your link dissappears.

Barry Schwartz

04/27/2007 10:31 pm

But that is one link, not 80% of your links?

jonah stein

04/28/2007 06:37 am

Barry: I imagine you are seeingng a combination of things. 1. Blogs posts on sites talking about your story and linking to it get syndicated and scraped, but as soon as the story isn't recent, a lot of these scrapers don't archive, so the links disappear. 2. Wordpress tends to create multiple URLs to the same content while the spider tries to find the canonically correct page. A percentage of people link back to the /blog root instead of the specific URL, so when the story is replace, the link count goes down as the spiders re-evaluate what is the correct "base" URL. 3. Crowd sourcing sites and social media appear to keep stories for less than 30 days, so those links expire.

Michael Mattinez

04/30/2007 09:18 pm

Without knowing which links come from Supplemental and non-Supplemental pages, without knowing which links come from value-passing pages, and without knowing which links come with which anchor text, your data tells you virtually nothing useful. Also, Vanessa Fox said on at least one occasion that they were working on expanding the amount of link data they publish. When the link data was first put up, Vanessa and Matt Cutts both pointed out that the link data was incomplete. Hence, your trends analysis is premature.

Barry Schwartz

04/30/2007 09:22 pm

Michael, I guess you did not read what I wrote. I said because of those issues, I could not come to any conclusion. Did you miss that?

blog comments powered by Disqus