Block (Passage) Level Link Analysis by MSN

Jul 30, 2004 • 8:34 am | comments (0) by | Filed Under Bing Search

With all this discussion abut the problems with PageRank and HITS, Microsoft released a paper recently discussing its solution for the faults in PageRank and HITS. The basic premise of the article, which can be downloaded here, is that the faults are that all links on a single page are not equal. By breaking up the page into "blocks" or "passages" (as Orion likes to call them in the thread at Search Engine Watch), you can semantically understand what sections of the page is about what. And then based on the mathematical location of links, determine the weight and relevancy of that link.

Very interesting idea, of course this can be abused as well. I for one would love to see this working at MSN Search. For discussion, please join the Search Engine Watch thread. Here is a passage:

Link Analysis has shown great potential in improving the per-formance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.


Previous story: Results Start to Come Back In Google
Ninja Banner
blog comments powered by Disqus