“End users want to achieve their goals with minimum of cognitive load and a maximum of enjoyment.” ~ Marchionini. Why? Because search users are nitwits. Mike asks us to consider the following. What if someone goes into a travel store and when asked what he is looking for, he answers “travel”. He goes on to describe it takes to get ranked in the top ten. Social sciences and bibliometry is also mentioned on the screen and have existence for a long time, even before search engines. They are being applied today in the algorithms that are created for search engines. The web is a social network he continues. Social networks have been extensively researched long before the web. He describes citation analysis and the how this is applied to in search engines. There is a difference between a citation and a reference.
Hyperlink analysis algorithms make either one or both of these simple assumptions. Assumption 1 – A hyperlink from page A to page B. Co citations, if a page C cites pages A and B, then A and B are said to be co-cited by C. Pages A and B being co-cited by many other pages is evidence. There are two main algorithms based on links. PageRank (Google): Each page on the web has a measure of prestige that is independent of any information need or query i.e. keyword independent. Roughly speaking, the prestige of a page is proportional to the prestige of the sum of the prestige scores of pages. HITS or Hyperlink-Induced Topic Search. Problem is that neither of these algorithms work.
The problem with HITS. Topic drift, nepotistic linking, and runtime analysis. Mike says there are three steps to success. They cracked the problem relating to time of a search from 11 seconds to instant. He describes Teoma and subject specific popularity. Adventures in search algorithms: What happened next? Both Krishna Bharat and Monica Hensinger join Google. Mike believes that Florida that moved from keyword independent to keyword dependent. Ending joke: There is a guy trapped in the desert and is looking for life. He finds a man face down in the sand, with a bag on his back. He thinks what was in the bag that would have saved him. Answer: Parachute
Next up was Rahul Lahiri he presents some of the properties that Ask Jeeves controls. Today they are ranked #7 on the web and have done exceedingly well since this time last year. What is their mission: relevance. He goes into general link analysis methods. The challenge is to discovering what the links are about. A link from page A to page B (or C) is a vote or recommendation by the author or page A for the page B (or C). The problem is that if you have a link with the anchor text budget, you don’t know what the budget means. Was it a budget for Budget rent-a-car or budget for someone’s companies?? That’s a problem obviously. He continues that organizing into local subject communities of sites. This is how Teoma views that web. Some of the challenges that they face is that solving the problem in real-time. 200 ms (milliseconds) to do this computation for each query, millions of times per day. You also have to identify the communities. The link structure of the web is noisy. Hubs link to topic specific pages. An example of topic focused vs. broad topic areas. Topic focused is a search for “buffalo” and broad topic areas is a search for “bay area airports”. Some of the benefits are that smaller enthusiast sites get a chance to come up to the top of the search listings (example search: fantasy football). The power of communities is a better vision, expert validation, contextualization, and better user experience.
Next Dr. E. Garcia, a pioneer that has allowed us to better understand the search engines as marketers was next to present. His plane has been delayed till tomorrow because of weather (its snowing heavily here), BUT there is a voice over for his presentation. Tapes starts. He is going to discuss grasping co-occurrence. Co-occurrence suggests association of relatedness. Side note: People are leaving because the audio isn’t too great. But not too many as there is a good amount of interest for this. Back to co-occurrence. Co-occurrence can be: Global, Local, or Fractal. This presentation is highly technical, and while I understand his work, it’s hard to follow. I am trying to get what I can, as its requiring very detailed listening and comprehension at this point. I apologize for any errors in this document.
Example of the case of “Hawaii” which is semantically connected to aloha, Hawaiian, Maui. C-indices can be used to estimate the relative presence of targeted keywords across search engines. He gives another example of “comida + mexicana” that are semantically connected. Example: C-indices can be used to monitor keyword trends, word patterns and topics in time. He goes on to talk about competitive words. Based on his research the example suggest that many competitive queries in Google tend to exhibit C12 indices. His research indicates that overused queries tend to exhibit unusually high C-indices while unrelated terms in a query tend to exhibit very small c-indexes. He gives the example of “guacamole optimization” with a low c-index of 0.12. On to term sequencing: EF-ratios. He talks about various types of queries such as a findall and exact and how order and frequency matter. He goes on to give the example that EF-ratios can be used to estimate the relative frequency of natural sequences and phrases in a source. So what about candidate sequences? These EF ratios can be used to examine how easy or difficult would be to rank for a given sequence in a given search. Keyword competitiveness is specific to each search engine. Some search engines return documents whose sequence can be found. When queried in EXACT mode, some searches return docs in which the queried term can be found. What is it separated by, delimiter (hyphen, underscore), space, or stopwords (in, of, with). So to recap, co-occurrence theory can be used to understand semantic associations between: terms, products, services.
Q: Interested in how we will be searching in 5-10 years time? Personalization? A: Where is search going? Mike did an interview from the founder of Teoma. It was interesting he says. The most interesting is that he said they need to get up 10 steps up the ladder, currently we are 3-4. The one thing that will change this, will be personalization. It’s misunderstood, personalization. It’s not giving you a search just for you. Its about returning results for your peer group. They can start to tailor the search specifically to you. There is data now using genetic algorithms and others set that are using these to create search engines. Mike concludes the more information we give the search engines, the better our experience will be.