Updated Google Patent Hints At Linkage Penalties and Site Expiry

Sep 19, 2008 • 7:59 am | comments (3) by twitter Google+ | Filed Under Google Search Engine Optimization

WebmasterWorld administrator, Tedster, posted a thread at WebmasterWorld discussing an updated Google patent named Information retrieval based on historical data. This is one of the more popular Google documents over the years, where much of the Sandbox theories came from.

In any event, Tedster pulled out several abstracts that are new in this document. I will highlight only two that I find would be very valuable to our readers.

(1) How does Google know when a site has changed enough where they should drop all the past trust and link popularity associated with that site?

...if the content of a document changes such that it differs significantly from the anchor text associated with its back links, then the domain associated with the document may have changed significantly (completely) from a previous incarnation. This may occur when a domain expires and a different party purchases the domain... All links and/or anchor text prior to that date may then be ignored or discounted.

So it is not just about changing the domain name registration information. That is why many folks who buy sites, try to keep the same style and category of content on that domain.

(2) We heard it before, "Don't get links too quickly" because it seems unnatural. Well, here it is on paper:

The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly.

A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine (to obtain a higher ranking and, thus, better placement in search results) by exchanging links, purchasing links, or gaining links from documents without editorial discretion on making links.

Yes, for most sites, you don't get 50,000 links overnight. But for some sites, it is possible for several reasons. So how does Google determine which sites naturally received these links so quickly? Well, if I understand this correctly, they look to see how quickly those links go away and the "dynamic-ness of the links." Here are those explanations from the document:

According to a further implementation, the analysis may depend on the date that links disappear. The disappearance of many links can mean that the document to which these links point is stale (e.g., no longer being updated or has been superseded by another document). For example, search engine 125 may monitor the date at which one or more links to a document disappear, the number of links that disappear in a given window of time, or some other time-varying decrease in the number of links (or links/updates to the documents containing such links) to a document to identify documents that may be considered stale. Once a document has been determined to be stale, the links contained in that document may be discounted or ignored by search engine 125 when determining scores for documents pointed to by the links.

According to another implementation, the analysis may depend, not only on the age of the links to a document, but also on the dynamic-ness of the links. As such, search engine 125 may weight documents that have a different featured link each day, despite having a very fresh link, differently (e.g., lower) than documents that are consistently updated and consistently link to a given target document. In one exemplary implementation, search engine 125 may generate a score for a document based on the scores of the documents with links to the document for all versions of the documents within a window of time. Another version of this may factor a discount/decay into the integration based on the major update times of the document.

Tedster goes a bit deeper into signs of the old supplemental index, which I did not go into over here.

Forum discussion at WebmasterWorld.

Previous story: Google Maps Adds Streets to Israel & Mobile Coming Soon


Michael Martinez

09/19/2008 05:11 pm

I would say the use of "signatures" to determine if documents have changed in the Supplemental Index is a reasonable guess, but since Google does now index at least some words from supplemental pages, they appear to be using more than signatures. Or else their signatures have become more sophisticated.

No Name

09/21/2008 05:53 pm

That language he quoted isn't actually updated language -- he said it himself a few posts down.


09/24/2008 10:58 pm

"A typical, "legitimate" document attracts back links slowly." That is crap. Legitimate content? What the heck does that mean? Are these documents "legitimate"? - Dow falls 500 points - AIG in Loan Deal with US - Freddie and Fannie are saved by Gov These will accumulate links slowly? Come on - this stuff comes fast and furious. The guys working on this patent apparently just stumbled from academia where links might grow slowly, just as slowly as Google's realization that evil do'ers will circumvent this "spam detection rule" by knocking their competitors out of the rankings.

blog comments powered by Disqus