WebmasterWorld administrator, Tedster, posted a thread at WebmasterWorld discussing an updated Google patent named Information retrieval based on historical data. This is one of the more popular Google documents over the years, where much of the Sandbox theories came from.
In any event, Tedster pulled out several abstracts that are new in this document. I will highlight only two that I find would be very valuable to our readers.
(1) How does Google know when a site has changed enough where they should drop all the past trust and link popularity associated with that site?
...if the content of a document changes such that it differs significantly from the anchor text associated with its back links, then the domain associated with the document may have changed significantly (completely) from a previous incarnation. This may occur when a domain expires and a different party purchases the domain... All links and/or anchor text prior to that date may then be ignored or discounted.
So it is not just about changing the domain name registration information. That is why many folks who buy sites, try to keep the same style and category of content on that domain.
(2) We heard it before, "Don't get links too quickly" because it seems unnatural. Well, here it is on paper:
The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly.
A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine (to obtain a higher ranking and, thus, better placement in search results) by exchanging links, purchasing links, or gaining links from documents without editorial discretion on making links.
Yes, for most sites, you don't get 50,000 links overnight. But for some sites, it is possible for several reasons. So how does Google determine which sites naturally received these links so quickly? Well, if I understand this correctly, they look to see how quickly those links go away and the "dynamic-ness of the links." Here are those explanations from the document:
According to a further implementation, the analysis may depend on the date that links disappear. The disappearance of many links can mean that the document to which these links point is stale (e.g., no longer being updated or has been superseded by another document). For example, search engine 125 may monitor the date at which one or more links to a document disappear, the number of links that disappear in a given window of time, or some other time-varying decrease in the number of links (or links/updates to the documents containing such links) to a document to identify documents that may be considered stale. Once a document has been determined to be stale, the links contained in that document may be discounted or ignored by search engine 125 when determining scores for documents pointed to by the links.
According to another implementation, the analysis may depend, not only on the age of the links to a document, but also on the dynamic-ness of the links. As such, search engine 125 may weight documents that have a different featured link each day, despite having a very fresh link, differently (e.g., lower) than documents that are consistently updated and consistently link to a given target document. In one exemplary implementation, search engine 125 may generate a score for a document based on the scores of the documents with links to the document for all versions of the documents within a window of time. Another version of this may factor a discount/decay into the integration based on the major update times of the document.
Tedster goes a bit deeper into signs of the old supplemental index, which I did not go into over here.
Forum discussion at WebmasterWorld.