More On Latent Semantic Indexing

Feb 4, 2005 - 8:57 am 2 by

Yesterday I wrote an entry named Latent Semantic Analysis (LSA) - Crawl into the Google Algorithm?, where I discussed how the current theories behind the Google SERP changes have to do with a new algorithm shift for Google. Now many believe that this has a lot to do with Latent Semantic Indexing. So now, as a SEO, if you haven't already, its time to read up on all the papers on this topic. I, Brian posted a new thread with resources to papers on the topic, he thanks SEW Moderator Marcia for the help with the papers. I'll list links to those papers below. Then Ammon Johns posts a quote from one source that really does a great job summarizing the topic. In addition, he posts to a thread on this topic started in 2002 at Cre8asite Forums named The Semantic Web.

Here is the snippet Ammon quoted in the SEW thread:

Regular keyword searches approach a document collection with a kind of accountant mentality: a document contains a given word or it doesn't, with no middle ground. We create a result set by looking through each document in turn for certain keywords and phrases, tossing aside any documents that don't contain them, and ordering the rest based on some ranking system. Each document stands alone in judgement before the search algorithm - there is no interdependence of any kind between documents, which are evaluated solely on their contents.
Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.
When you search an LSI-indexed database, the search engine looks at similarity values it has calculated for every content word, and returns the documents that it thinks best fit the query. Because two documents may be semantically very close even if they do not share a particular keyword, LSI does not require an exact match to return useful results. Where a plain keyword search will fail if there is no exact match, LSI will often return relevant documents that don't contain the keyword at all.
[ Source: http://javelina.cet.middlebury.edu/lsa/out/lsa_definition.htm]

Here are a listing of papers on the LSA topic from the thread:

Added: Check out SEO Book's LSI post, very detailed and easy to read. Good work.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Google Search Ranking Volatility, Site Reputation Abuse Enforcement & Pichai On Search Quality - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Video Recaps

Search News Buzz Video Recap: Google Search Ranking Volatility, Site Reputation Abuse Enforcement, Pichai On Search Quality, HCU Recovery & More

May 10, 2024 - 8:01 am
Bing Search

Mikhail Parakhin No Longer Working On Copilot At Microsoft

May 10, 2024 - 7:51 am
Google Search Engine Optimization

Google: Site Reputation Abuse Isn't About Linking

May 10, 2024 - 7:41 am
Google Maps

Google Local Panel With Owner Attribute

May 10, 2024 - 7:31 am
Google Ads

Google: Proximity Not A Relevancy Factor For Local Service Ads

May 10, 2024 - 7:21 am
Google

Google May Show How Many Shoppers Purchased On Your E-Commerce Site

May 10, 2024 - 7:11 am
Previous Story: AdWords Relevancy & AdSense Revenue Take a Hit