Search Algorithm Research

Aug 8, 2006 • 7:24 pm | comments (2) by twitter Google+ | Filed Under Search Engine Strategies 2006 San Jose
 

Detlev is the mod.

Rand Fishkin from SEOMoz is up first. Yesterday morning Chris Sherman said the algorithms of search engines have reached their max. Rand says most would tend to agree. He said lets look at the last two years of link analysis. What is a manipulative link? What are algorithmic techniques combating this by the engines? Then he will look at some solutions. Manipulative links? She shows a link to free poker tournaments in her blog, via a comment spam link, but she didn't use the nofollow attribute. If you go to technorati, and add your blog links, is that a manipulative link? He then mentions the DP Coop links and said it doesnt work now. He then shows the W3C page that had bought links on them. He shows a little diagram showing off ways to detect spam links. The DMOZ directory clones are an other method that have been dropped. He then brings up the Google Trends page, zillow the term had a major spike, that tells Google that these searches are legitimate searches and you can expect that level of links developed. They also have Google Analytics and they can (they say the dont) use that data for ranking purposes. You also have free WiFi in San Francisco, but what will they do with that data? Google has manual link identification site identification systems. He then brings up the SandBox, "throwing out the baby with the bath water." Not all sites have to go through the SandBox, i.e. Zillow. He shows an example of the Sand Box at Google for Bill's SEO By The Search query. Sand Box is about trust, Rand says. Launching with a bang is a good way to bypass the Sand Box. Also you can link build slowly and naturally, will help. Link building practices; links via email (not automated), gaining attention, link baiting, trust sites, regional sites, press releases. Subdomain issues, non english sites do well, and also wiki, technorati, mysapce and other web 2.0 sites get around this. He then shows off some SEOMoz.org resources.

Bill Slawski from SEO By The Sea is next up (big mod Cre8asite forums). This should be pretty technical.... Stage One: one size fits all. The search engine index web, match queries, return results - that is how it was in the 90s. Stage two is understanding the users. Search engines index web, analyze queries, collect users info, match with intentions and return results. Stage 3, understanding people. Search engines index the web, analyze queries, map peoples' interest and return results based on that. He said that is what makes the MySpace Google deal interesting. He has a list of some of the stage two papers released, some by AltaVista, Excite, etc. In stage two we see user measure user behavior a lot; historical data patent application, bookmark manager patent application, web accelerator patent application and google suggest patent application. How does Google collect user data; personalized search and news, toolbar, isp data, gmail, etc... Retroactive answering of search queries; Google Alerts that looks at your search history and shows you news based on your search history (wow). On to stage 3, where Google understands people. InterestMap: Harvesting Social Network Profiles for Recommendations, "as a recommender systems become more central to people's lives, we must start modeling the person, rather than the user." "Why should recommenders be restricted to data fathered within the context of the application," i.e. partnering with other companies. InterestMap identifies interest from social profiles, maps passions, merges interests and passions with detail... Achieving results in the age of personalization; ranking reports dont help, log file analysis doesnt help as much as they did, you need to know your users better. He asks, when a customer buys from you on your e-commerce site, do you Google those people? (My comments: Who ever thinks that way, but he makes a good point). Bill adds that he spends a lot of patent application reviews and published summaries late last night on them at SEW Blog. He mentions one of outland research, about user interest and information - gender and sex ranking results based on that. (Bill, excellent job keeping this talk easy to understand).

Jon Glick from Become.com is last up. He said of all the stuff being put out there, what is being used. A patent is really a trade, they get exclusive rights to use this information for a number of years. Patents are used for: (1) red herrings (not going to be used but you know), (2) trade secrets (like Sand Box) are not in patents and (3) rumors, (4) missed it. How fast you change your content has an impact, including registration, when they first found the site, most recent crawl, and last time you changed the content on the site. They look for meaningful change (not just date changes). When a site moves IP Addresses, it is often re-evaluated, it shows a possible new ownership and change in parked status. Rate of change of links is tracked; most search engines limit how quickly a site can gain connectivity (sandbox, link aging), a sudden jump in in-links can draw scrutiny from the spam cops (joining a major link network, interlinking of lots of domains), there are exceptions for spikes sites; editorial review, lots of accompanying news/blogs posts and lots of web searches. Tagging; unlikely to be used by major SEs because it is easy to keyword stuff, anchor text offers the same benefit with better source validation and SEs experienced this with the keyword meta tags and google experimenting with a closed system (Google Coop), very useful for multimedia content (images, video, audio, etc.) and it hasn't been heavily spammed. Quality Scores; editorial quality is now part of paid listings, yahoo also will have it in their panama release, any pages you submit will go through this review, it will most likely not impact your organic rankings. Evaluation of outlinks traditionally looking for who links into you and not who you link out to, but now search engines are looking at who you link out to, they use this for spam evaluation and give you a spam score to change the trust level of your site. AdSense links do not count against you, they are through redirects also, and also the nofollow attribute may make a difference. The use of personal data is not used heavily right now but it may be used soon, information sources include user registration, search history and yahoo groups, gmail, etc. Hurdles to search "getting personal"; SEs not sure what to do with all that information, multiple users/machines, people tend to search for what they don't know, and serious concerns with both privacy and perception. Zip Codes/IP may be used to improve local results, SEs already extraction local info from web pages and AdWords and YSM support geo-targeted bidding. Unusual stuff that actually does matter include URL length (short URLs are more authoritative, and crawl depth limits), and RSS feeds (theory is if you have RSS feeds you probably have fresh content, sites with RSS feeds are crawled more frequently, especially with Yahoo and you also get an extra line in your search result). Yahoo uses over 80 factors in ranking, small changes happen with almost every new index. Content, connectivity and outside opinion all matter.

Previous story: News Search SEO
 

Comments:

Bill

08/09/2006 01:41 am

Thanks, Barry. :) That was one of my big concerns - trying to keep what could be complex simple enough to get across to a large audience.

Bernt Johansson

08/09/2006 09:20 am

Interesting reading, but you really ought to increase line-height, in your copy. It gets really hard to read off the screen when there is so much text. I'd go for 140%.

blog comments powered by Disqus