New Google Sandbox Theory: "Flattening Effect of Page Rank Iterations"

Apr 28, 2006 • 7:37 am | comments (7) by twitter Google+ | Filed Under Google Search Engine Optimization
 

There is a new WebmasterWorld thread that made it to the front page very quick named Flattening Effect of Page Rank Interations - explains the "sandbox"? I feel like I have to quote the majority of the post for you to understand this new Sandbox theory that many in the thread find "refreshing" and "intelligent."

Note the PageRank equation (sans filters) is:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) .

The first observation about this equation is that it can only be calculated after a statistically significant number of iterations.

If you analyze a site with 5 pages that all link to each other (the homepage having an initial PageRank of roughly 3.5), what you see in the first iteration of PageRank is that the homepage is PR 3.5, and all other pages are PR .365 – the largest PR gap that will ever exist through multiple iterations in this example.

This homepage PR represents a surge in PR because Google has not yet calculated PR distribution, therefore the homepage has an artificial and temporary inflation of PR (which explains the sudden and transient PR surge and hence SERPs).

In the second iteration, the homepage goes down to PR 1.4 (a drop of over 50%!), and the secondary pages get lifted to .9, explaining the disappearing effect of “new” sites. Dramatic fluctuations continue until about the 12th iteration when the homepage equilibrates at about a lowly 2.2, with other pages at about .7.

I believe that the duration of the “sandbox” is the same amount of time it takes Google to iterate through its PageRank calculations.

Therefore, I think that the “sandbox” is nothing other than the time it takes Google to iterate through the number of calculations uniquely needed to equilibrate the volume of links for a given site.

Did you digest that? WebmasterWorld Administrator, tedster, adds that deep links may "short circuit the flattening effect that PR iterations might produce, especially if they were added at decent intervals." To which an other WebmasterWorld Administrator, trillianjedi, adds "You have to begin to consider whether actually the entire PageRank system of old has been replaced with something entirely different....." but he continues to explain that this and all the other theories are speculation, which is why these threads are so enjoyable.

Google has continued to say that PageRank is used and part of the algorithms. Many SEOs believe it is only used now (1) to determine which site should rank higher when site A and site B are equal in all other characteristics (like that ever happens) and (2) to determine the crawl frequency of certain documents. But maybe this theory is right, or maybe it is wrong - maybe Google is using PageRank for this purpose? Who knows....

Forum discussion at WebmasterWorld.

Update: Please read the comments in this entry by clicking here.

Previous story: Contextual Ad Click-O-Phobia Begins To Affect Google & Yahoo! Publishers
 

Comments:

Aaron Rubin

04/28/2006 04:33 pm

The first iteration of PR calc for the entire web is highly inaccurate, it has to be run iteratively to produce what we know as PR (otherwise all links are equal). Since its run web-wide with a large number of iterations, the site PR is also calculated correctly the first time. Making the hypothesis, um, obviously not accurate. Am I missing something?

Michael Martinez

04/28/2006 04:33 pm

Barry, it's posts like that one which justify my ignoring WebmasterWorld's rantings for the most part. I admire the way you sift through all the irrational stuff looking for the gems. But whoever wrote that doesn't understand what the PageRank algorithm is doing. All the iterations are done at one time. Anyone who wants to calculate PageRank on a document collection has to run through multiple iterations as part of the same processing. It's not like you run an iteration today, then you run an iteration tomorrow, etc. So, Google decides, "We're going to do 50 iterations for this PageRank update". Then they just start doing the calculations. An iteration is simply one pass through the database. The formula is crude and requires multiple passes through the database in order to achieve a close approximation of what the actual PageRank may be. One iteration per update is completely useless.

Barry Schwartz

04/28/2006 04:38 pm

Hence the forum discussion.

joe banner

04/28/2006 06:08 pm

Bravo Michael Martinez. Look at the title of this post, interation is not even a word. Si stih na tatmept ta SEO msispleling?

Marcia

04/28/2006 07:39 pm

>>Did you digest that? WebmasterWorld Administrator, tedster, adds that deep links may "short circuit the flattening effect that PR iterations might produce, especially if they were added at decent intervals." To which an other WebmasterWorld Administrator, trillianjedi, adds "You have to begin to consider whether actually the entire PageRank system of old has been replaced with something entirely different....."<<< There's a white paper that deals with something related to this very thing. I've got a bunch collected, I'll try to dig it out, I'd like to hear Michael's comments on it. The thing is, there have to be however many iterations for Pagerank to converge. It doesn't resonate with me that it takes many months for PR to converge.

Search Engines WEB

04/28/2006 09:27 pm

"Many SEOs believe it is only used now (1) to determine which site should rank higher when site A and site B are equal in all other characteristics (like that ever happens) " In theory, when all things are equal - ANY VARIABLE should cause an change in rankings for a given keyword. In other words, any change in the Title Tag, or a Change in Bodytext, or a change in the keywords in popular Back link. Would "one-up" a site. Google probably BALANCES all the variables in varying degrees - and the so-called Updates - are just a re-prioritization of the ALGOS. There has possibly been a re calculation of the PageRank... - years ago it was basically QUANTITY - now it seems more focused on the PageRank and TRUSTRAK of the Back Links. (quality link neighborhoods) It may also be a possibility that inherent statistical flaws exist in Google's highly complexed algos, that also account for these fluctuations.

Michael Martinez

05/01/2006 03:41 pm

I don't pretend to know how long it takes Google to calculate PageRank. That they only publish Toolbar PR once every 3 months (give or take) doesn't necessarily mean they are taking that long to calculate the actual PageRank. Of course, Mike Grehan likes to cite people at Yahoo! and Ask who claim that Google never fully implemented PageRank. But even if they have fully implemented it, if they are following the classic formula described by Sergey and Larry in the orignal "Backrub" paper at Stanford, PageRank is only added to the relevance scores. It's never been described as the most crucial ranking factor by Google in any publication I have seen. People continue to make too much fuss over it. So regardless of whether they do 10 iterations a day or 50 iterations a quarter (and "iteration" is a Computer Science and Mathematics term that has been used for decades, if not longer), there are far too many people obsessing about PageRank. It would help the Webmastering community in general, and the SEO community in particular, if all SEO forums had the good sense to ban PageRank discussions for 6 months out of every year.

blog comments powered by Disqus