Dangling Links & Google's PageRank

Jul 31, 2007 • 7:29 am | comments (2) by twitter Google+ | Filed Under Google Search Engine Optimization
 

EGOL started a thread at Cre8asite Forums asking how using the various robots.txt command or META noindex nofollow commands will impact the flow of PageRank and link popularity.

Several members go through some detailed examples and what-if scenarios. However, Ammon sums it up nicely.

Ammon explains that a page that is that is not in the Google index but the links are picked up to that page, won't count. Why? Well, the original PageRank document has a concept called "dangling links," which reads:

Dangling links are simply links that point to any page with no outgoing links. They affect the model because it is not clear where their weight should be distributed, and there are a large number of them. Often these dangling links are simply pages that we have not downloaded yet.

Because dangling links do not affect the ranking of any other page directly, we simply remove them from the system until all the PageRanks are calculated. After all the PageRanks are calculated they can be added back in without affecting things significantly.

This may be a classic example of what a "dangling link" is. And if this original paragraph in the PageRank paper is still valid, then there is your answer. But is it still valid?

Forum discussion at Cre8asite Forums.

Previous story: More Yahoo Search Ranking Changes
 

Comments:

Michael Martinez

07/31/2007 05:05 pm

It's still conceptually valid since the PageRank calculation process is only concerned with mapping value between documents in the collection. A number of improved methods for calculating PageRank have been proposed through the years but they all have to factor out dangling links until the last iteration.

Chris Beasley

07/31/2007 09:41 pm

You (& Ammon) are reading the paper wrong. Brin & Page are specifically referring to, in that section of the paper, the process of going through x number of iterations of the PageRank algorithm to fully calculate the PageRank of each page. They're saying that since these links are not going to affect any other page on the Internet, they do not need to be included in all the X iterations, so they are removed to saved server processing resources and then readded back in for the last run. So the pages still get PageRank, are still a part of the algorithm, and still matter, they just aren't included in the hundreds or thousands of iterations Google runs until the very end.

blog comments powered by Disqus