The Life Time Value of Links Based on Google Webmaster Central

Apr 27, 2007 • 2:14 pm | comments (7) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Ever since Google expanded Webmaster Central to include a link analysis tool, we have been collecting the raw data to analyze for later purposes. Last month, we saw some February and March link data. This month, let's look at April and put it up against February and March.

This month, only three of the top ten most-linked to pages made Digg's front page:

Tamar did an excellent job organizing the data for me to look at and analyze a bit more. Here is our most recent linkage data from Google Webmaster Central's link tool. One thing stands out about this data is that the first article has 15,426 more links than the second most link to article - that is huge. Anyway, here are the most linked to article, based on April's linkage data.

April 2007 Linkage Data Link #
75% of Google's Blogspot Blogs are Spam 18,368
Roundtable Coverage of the Search Engine Strategies New York 2007 Show 2,942
Yahoo! Removes Category (Directory) Links From Under Search Results 1,648
First Screenshots of Google Pay Per Action in Action 1,318
Listing of Some Free Keyword Suggestion Tools 1,309
Google Wins in Kinderstart Lawsuit 1,210
Seeing Google Pay Per Action in Action 1,207
Microsoft Live Search Link Command Operator Offline 717
Google Sending Out More Google Coolers to AdWords Advertisers 698
Can You Place Google AdSense Ads on 404 Pages & Thank You Pages? 659

To remind you, here is the March linkage data that we posted in the past. There may look like some duplicates are posted, but trust me, this is how the data was exported. So even though, we have two articles listed separately, that is how we got it from Google in the CSV. Why? I do not know.

March 2007 Linkage Data Link #
Google Sending Out More Google Coolers To AdWords Advertisers 1,961
Listing of Some Free Keyword Suggestion Tools 1,391
Google Allows AdSense Publishers to Click Play Button on Video Ads 1,073
A Conversation With Google CEO Eric Schmidt 966
AJAX & Search Engine Optimization (SEO) 938
Google Allows AdSense Publishers to Click Play Button on Video Ads 878
Google AdSense Overview Page Goes Blank For 30 Minutes 779
How Does The Yahoo! Directory Rank Sites? 752
Google Maps Sends Health Emergencies To Wrong Location 752
Search Pulse 20... 633

Here is the February data. You will see the top nine, because that is what we had at the time. Also, Google themselves told us that data was not 100% complete yet, so it may not be wise to judge this data in our analysis.

February 2007 Linkage Data Link #
A Conversation With Google CEO Eric Schmidt 844
Screen Shot Of Quality Score Metric in AdWords Console 736
Microsoft Banning Sites from Live.com For Link Exchanges 516
Vanessa & Adam Working on Christmas Day 514
Google AdSense Competitive Ad Filter Not Working? 503
Making Bidding Mistakes at Google AdWords ($0.10 Vs. $10.00) 479
Dynamic Keyword Insertion in Your URLs With Google AdWords 472
Programming Note: Vacation Until 1/17 433
Google Toolbar PageRank Update Being Reported 426

The first thing you can look at when you plot this on a chart, is that the as articles get older, they have less links (most of the time). It is understandable that new articles one rights, will attract a lot of fresh links. However, in many of the examples we listed above, the links have dropped more than half.

A quick line chart of the articles that had the top amount of links in February show the downward trend of links, as the article ages.

Picture 1

Why is this? Why do old articles have less links over time? Here are some possible ideas:

  • Pages with the links are deleted
  • Pages with the links are password protected
  • Pages with the links are moved into the supplemental index
  • Pages with the links are temporarily offline

But honestly, this did not satisfy me. First, does Google not count links that are from within the supplemental index? I was thinking about testing this but the problem is, we do not know how recent this data is. Google pushed us our linkage data, based on an archive copy of their data. A page can be in the supplemental index at any point, so it is hard to say, without a shadow of a doubt that a page was in the supplemental index at point A. We don't know when the links were calculated to come up with that evidence.

One weird observation was that the majority of the most linked to pages for the months above fell out on or around the same date. Five of the top nine articles on February's data was on the 25th or the 26th of December 2006. Six of the top ten articles from the March data set fell out on February 20th and 21st of 2007. And a majority of the April most linked to articles fell out on or around March 26th.

This seems to add more evidence that Google finds new links quickly and then doesn't count those links as time goes on. Maybe Google weeds out links after a certain amount of time? Maybe those links are dropped for the reasons listed above or maybe there is a new filter in place? I do not know. The data tells me that of the top articles from all the months export files, there is a 34.04% latency in the links over 1 month and a 17.67% latency over two months. But there are some outliers tend to skew those results.

One of the patent applications we linked to today is on Document Scoring Based on Link-Based Criteria that Matt Cutts helped to write. In that document, we see that it is possible Google may be looking at and filtering or scoring criteria such as:

  • Rate of the disappearance of links to a document
  • The time that takes
  • The rate of links relative to the freshness of the document
  • and so on... (I am not going to list them all out here

Is the list of links we see in these webmaster reports provided by Google valuable links? We know nofollowed links show up in the tool. So maybe there is zero filtering going on for the newer links? Maybe not. I just don't know.

I was hoping to learn more about how Google looks at links through these reports but I do not believe I have. Maybe I need to collect more data over time? But I think the main issue is that I do not have the time stamps of when the data was collected by Google. So it is incredibly hard to analyze this data. A month by month snapshot is great but not enough for us to derive any significant conclusions from.

Previous story: Google Toolbar Showing Buggy PageRank Data?
 
blog comments powered by Disqus