Search Algorithm Research and Patents

Feb 28, 2006 • 12:29 pm | comments (0) by twitter | Filed Under Search Engine Strategies 2006 New York
 

Moderated by Detlev Johnson – Position Technologies. Very small room for this topic…full already 5 minutes before the start. (added- when I left the session a little early, there were literally 50 people standing outside the room by the door in order to hear this topic) Welcome everybody, we are going to introduce the idea of what an algorithm is. Basically, it uses a computer to rank sites in an order of some “pseudo-relevancy.”

Rand Fishkin – SEOMoz.org

“Understanding Search Engine Algorithms Requires Serious Research.” This is an advanced track session, so he hopes people have a basic understanding. Why study search algos? Where to find research? What has the SEO community learned from algo analsyis?

Why study search algos? To gain an understanding of how SE’s work. Potential clients, managers, and your staff and team will thank you. A strong understanding eases identification of keywords difficulty (how hard is it to optimize). Where is research published? (see: seomoz.org/blogdetails.php?ID=850) Some places: Patents/apps. University research, books, IR and SEO blogs, conferences and sites- use the IR sites if you want to talk about the “super complicated issues.”

What have we learned from algo analysis? Google’s classic algo: PageRank. A map of the web can be constructed, etc… After PR, JohnKleinberg at IBM used “HITS” algo…what do links say about a site, hubs and authorities, addition of “CLEVER” allows for further research. (note: please search for both of these to find out more about them) More recent integrations of algo analysis include TrustRank. SE’s want to be able to put trust into certain sites and take away from others. They don’t want paid links to influence. Feels that Reciprocal links, “spam islands” and other FFA link “schemes” may be targeted. Guest book & blog comment links may not always be the best sources for relevance. Tells a story about an interview he had, and then the writer of the article spoke to Matt Cutts about the site that Ran mentioned that was buying links for the site from Harvard Crimson, which was (emphasize “was) a good .edu site to buy links from. Suddenly the ranks dropped. (my thought: d’oh! That must be why Jim Boykin says to never tell Matt Cutts your URL’s)

Google applied for patent: “Information Retrieval based on Historical Data.” Identifies areas that can be targeted including links, site registration data, user data (clicks, time on site, ectc…). The source and speed of links gains may be a “flag.”

The future: social and personalized search. Refining & addition of info sources. Greater individual attention to links and sites. Improved detection of “manipulable” areas. What do experts think is most important: seomox.org/articles/search-ranking-factors.php lists these in order of their importance.

Bill Slawski – Seo by the Sea, Inc. Talks about “vertical creep into search results session from yesterday and the question from the audience about “how did they get there?” (the vertical listings) The answer is obviously algos. Goals: learning how search works. Build sites that rank well. Find good questions. Understand limitations of SE’s. Anticipate the future of search.

Things to use when researching: Primary sources: SE guidelines, patents and patent apps, papers form SE employees, official and unofficial blog posts (googleresearch.blogspot.com, for example). Secondary sources: academic papers, trusted commentators. Other sources: forums, articles, newsletters, experimentation.

Evaluating trustworthiness: From Stanford Pervasive Technologies Laboratory. There is a bunch of good guidelines on their site regarding how to make your site look “credible.” For example: place your address on a site, and do not place it within an image. When evaluating a patent, ask: what problem does it claim to address? Who are authors? Cites to other sources? Related solutions? Other search engine approaches? (are they even doing anything in regards to the stated problem?) Opportunities to experiment? Need to ask these when looking at patent apps…he grins and says: “does G release a patent app simply because they want Y and MSN to spend resources on trying to emulate?” (laughs)

One recent patent app discussed assigning geographic locations to pages. US Patent app 20050182770. This patent app discusses the ability to favor websites from specific locations instead of directories that list the sites. Would this be something the SE’s want? Of course they do…they want relevancy immediately useful to the SE user instead of having to take steps through a directory. The writers of this patent app are two brothers from Australia that work at Google in DC. They have also released some other patent apps…many dealing with Google Maps. Their former company, Where 2 Technologies, was acquired by Google.

Quick note to SE’s: “Denzell Washington” and “Kentucky Fried Chicken” ARE NOT geographic locations. (laughs)

Jon Glick – Become.com Introduced by Detlev as having been with AltaVista through lots of changes, and now has helped to start become.com. Algos: what is the stuff that will actually impact rankings? Some are done to confuse competitors or to make them do unneeded research (as Bill said.) Remember that they do not have to use all the features that are mentioned in a patent, and conversely they don’t have to place everything that they will use in the patent. So you need to ask what will be used, and what wont?

“New” ranking tools possibly being/going to be used: CTR (click-through-rate) being used as an organic ranking factor. None of the SE’s could use this because it would be really easy to spam. The first uses of CTR by the SE’s will more likely be used for demotion only.

Time spent on a site? This is used to flag sites where users hit the back button almost immediately. The site may be 404, the site is clearly off topic. Boosting ranking for final destination sites? Actually SE’s “prefer” (ask Jon for a better explanation on this) sites that do cause an eventual visit back to the SE results. For example Brittany Spears searches usually cause the visitor to go to a site, check out some pictures (or song lyrics if you are me – in case my wife is reading this), and usually go back to the listings to see another site?

SE’s do keep a history of sites, so updated content is good. Tricks of changing the timestamp, etc…just to get a quicker re-crawl: this won’t work. Duplicate detection technologies used to find meaningful changes to a site.

Most SE’s limit how quickly a site can gain connectivity (sandboxing, link aging topics) A sudden jump in links can draw scrutiny from the SPAM cops…if they are legit, you’ll be OK.

“Tagging” unlikely to be used: easy to keyword stuff. Inound anchor text offers the same benefits with a better source validation. None of the major SE’s use the META keywords tags anymore. However, tagging is still very useful for multimedia rankings (video search and podcast search, for example.)

Evaluation of out-links: SE’s are starting to look at outlinks. G and Y use it for spam detections. Couple of notes: ad units don’t count. Some SE’s may increase rank slightly if you link to authoritative sites. Be careful who you trade links with.

Use of personal data: Information sources for SE’s include: user registration, search history, Yahoo groups and the like can indicate interests, etc. However…what can they do with all this data? How can it be applied to an algo? Also…multiple users of the same machine may cause problems. There is serious concern about both privacy and perception. Zip codes/IP may be used to improve local results, as Bill and Rand mentioned. This is called “entity extraction” process. So once again, make sure you place your address in the footer.

Q&A Re: local search…does adding info beef up local and in turn remove some nationwide results? Jon says not really, it will mostly just boost your local ranks probably. You can also get results by placing your phone number on the page (not just the “800” number) in order for the SE’s to be able to geo-identify you.

If you have many reciprocal or paid links that are “fair,” should you be worried? Rand: almost feels that you should stay away from active link building. Do active PR, do active marketing, do promote, but DON’T go pay for links, go to directories, unless you are “out of new site penalty period.” Jon: make sure the links you get have good targeted kw anchor text (hey guys how about leaving something for us in “Linking Strategies tomorrow? :p) Detlev adds: it’s all about context.

Ways to test? A good idea is to use a throw-away domain if you are really worried, since SE’s will usually only give one warning before banning. Detlev ads that you should read the site content out loud and make sure it makes sense. Don’t ruin your brand with a message that is horrid just due to SEO.

An attendee worried about transferring hosting of a site with great UK rankings to a US host? The speakers seem to all agree this won’t be a problem, because when they see the site they will notice the content has remained the same. The fact that IP has changed will probably not hurt you. If you switch the registration of the site, you may lose some UK rankings.

This is part of the Search Engine Roundtable Blog coverage of the New York Search Engine Strategies Conference and Expo 2006. For other SES topics covered, please visit the Roundtable SES NYC 2006 category archives.

SES NYC Tag:

Previous story: Reputation Monitoring & Management
 

Comments:

No comments.

blog comments powered by Disqus