One of our most popular PubCon sessions, this event is also known as the Search Engine Smackdown.
Expect a "State of the Engines" address by the leading search engines of today. Yahoo, Google, Ask and Microsoft will all run down the current status, features, and fresh offerings of their respective search spaces.
Related blog entry from a few years back: http://www.pubcon.com/blog/index.cgi?mode=viewone&blog=1156867200
Moderator: Brett Tabke Speakers: Matt Cutts, Software Engineer, Google Inc. Sean Suchter, VP, Yahoo! Search Technology Engineering, Yahoo! Nathan Buggia, Live Search Webmaster Central, Lead Program Manager, Microsoft
Nathan Buggia: State of Live Search - what does it mean for publishers? We've talked about themes of live search - deliver best search results, simplify key tasks, and innovate in the business model.
Best search results: it's all about relevance. We've made a lot of progress. Does a query answer your question? We've been tracking this for 4 years. In the past year, we're in the same ballpark - not exactly like Yahoo/Google but very similar. Some queries we're better on but some aren't perfect. It's about freshness of content and depth of content.
Specific improvements: improving the crawling performance - compression and if-modified-since. We create less load on your server and do a more efficient job of crawling. If your resources are gzipped, we take less bandwidth.
Standardization of REP rules - these are a core set of rules for robots exclusion protocol. It's easier for publishers can specify the policies for searche engines. These rules are shared. MSNbot has adopted the common set of rules: now we support regular expressions.
We continue to invest in sitemaps. They can be hosted anywhere. There's a lot of flexibility for publishers. It also helps understand canonicalization issues.
There's a significant increase in crawling capacity.
We also realized that the best search results isn't about algorithmic improvements. It's also about providing tools: Webmaster Tools. We offer: troubleshooting tips. We took a list of the top issues that Live search encountered when crawling websites - 404 errors, too many parameters, blocked by robots, and unsupported content. There's reporting being provided around these and even filtering. Next week, we'll launch a new feature about malware. We scan every page and see what spawns a malicious process; those pages are flagged and cannot be clicked on in the user experience of Live search. Publishers can find their own links in the tools; they can also get a list of outbound links that are also infected.
We also provide a lot of tools around ranking. Information is provided on Static Rank, dynamic ranking within site, backlinks, and penalties.
There are also some issues on the community forums with a 3 day turnaround.
Another tool launched about a year ago is the adCenter Excel Keyword Research tool. It gives you access to an API that gives you keyword data for Live search - demographic and monetization information.
Simplify key tasks: - The future of relevance? We found that there are many use cases for when people come to search engines. Sometimes they're doing navigational queries. Sometimes people come to search engines and don't know what they want. These are exploratory scenarios. We provide richer media in the search results in addition to 10 blue links. Also, deeper pages may not be related to the search experience but the topic. As a publisher, there's more surface area on how to reach customers with specific content. Some of this is video, structured content (products, reviews, and more information about your website). This is expanded into Hotmail and other properties as well.
Innovation in the business model: We're talking about the Cashback/adCenter scenario.
We also have Project Silk Road that consolidates things to increase engagement (enhances the seite with Live search results/customizes 404 error pages with the error toolkit, and create rich user experience with Virtual Earth and silverlight), generate traffic (optimization of site with the tools, deep content partnerships that increase distribution, and enhanced ad format solutions), and drive insight (how your website performs and your customers. Rich site statistics, monitoring, and optimization)
Within that, there's the Live Search API. We asked a lot of our partners about what they needed in an API. Publishers wanted to be in control of the results of the API. Now, you can reorder the results, skin results and ads to match your website or application, and filter out 300 ad providers that don't make sense (competitors, aren't good for your audience, etc.)
The technical aspects of the API also needed to meet business needs: - The query limit is removed - now unlimited - Rich query language - site operators that you've seen in the past (e.g. site:). You can alter how dynamic ranking relevancy favors freshness, accuracy, or whatnot. - Many types of content - web, news, images, encarta answers, spelling. Different corpuses in the backend are now accessible. - Implements all standard protocols (REST, JSON, RSS, SOAP) - they can use the API any way that people develop.
Yahoo is trying to get rid of the 10 blue links.
Limited choice: three players dominate the maket. Neither site owners or searcher can exert influence, so Yahoo is trying to address it.
Search Assist feature is being worked on to make the best possible search queries.
Right now, Yahoo is looking to move from "to do" to "done" - getting to the answer by reducing frustration, trying to structure information from the web directly, etc.
One example is the music player integration- "Play the web" in Yahoo Search
He shows a SERP that shows many initiatives: rich media modules (video and headlines), deep links, and news federation.
The other big area is about the ecosystem. We're really trying to create a community around search (think PubCon). We're trying to set up incentives for everyone - Yahoo and end users. A few ways to do that: opening search (SearchMonkey) - coming from outside in. What does this mean? Yahoo wants to move from a simple presentation to a more useful structured presentation when appropriate for the task the user is trying to accomplish (not uniformly, not for all queries, not for all users). For site owners, this helps the users get right to the answers. The traffic should increase in quality. It hasn't hurt clickthroughs to your site. It will increase loyalty and engagement.
There is a lot of success with the SearchMonkey ecosystem. A lot of properties, including People magazine, Wikipedia, Trulia, WebMD, and more are utilizing it.
Another innovation includes BOSS, a big initiative - build an open search service. The idea is to open the platform completely. Trying to be a principal search engine is a hard thing. You need hardware, data, and more. So the idea is to open it up completely so people can interact with the query handling and crawling and use it directly. The goal is to have high quality search experience to be relevant, comprehensive, fresh, and well-presented.
Some examples: 4 hoursearch - it was made in 4 hours by guy who said he paid $10 for pizza and beer. It's very straightforward and a different type of search presentation. Another one is PlayerSearch which is more specialized (like SportsCenter). NewsLine is another with a cool layout of how the news are presented. Finally, Tianamo is a 4th - it presents the data in this somewhat mountain format. It's a landscape of queries and things surrounding them in a visualization.
Matt Cutts: State of the Index. What has happened in 2008 and what should we expect in 2009? - Google Chrome is a wicked fast browser - Google Android is an open source operating system
There's other stuff too - better machine translation, better voice recognition, Google Suggest, improving personalization and universal/blended search
There were a lot of small things: 2001 search index, video and voice chat in Gmail, ability to track the flu (by finding out who is searching for the flu/cough/cold symptoms on Google!) - it's really cool. - Why is this interesting to webmasters? You don't have to do this with flu. You can look at Google trends in general and even check them for websites.
Google Ad Planner slices and dices by demographic.
Let's drill down: what have we done for the webmaster? We're taking PDFs that are images and are running OCR on them. We're crawling flash better - pulling out text of transitions of Flash files.
2008 Webmaster Launches. Look at pinkberry.com/mobile versus redmangousa.com/ on your iPhone. Only one works. Google is working to understand these flash files that aren't showing up on your phone.
There are a few other things: - Adavanced segmentation of Google Analytics - On demand indexing for Google Custom Search Engine - Google will reindex up to 10 pages within 24 hours. - webmaster APIs for hosters and Gdata - translation gadget for your website. If you have Chinese visitors and you write in English, the site can be translated into Chinese.
Webmaster Communication - it's huge so far. We've had 3 chats so far with 700 people dialing in on the most recent chat. We're blogging more, including more videos, and there are now blogs in different languages. If you register your site and you have malware or are caught for spam BEFORE you register, those messages will be waiting. - Yesterday, Google came out with a 30 page guide on SEO 101. This means Google values SEO.
2009 Blackhat trends: - jeevesretirement.com was bought by Ask.com. Ask forgot to renew it. Jeevesretirement.com was bought by porno people. People grab expired domain names and take advantage. - Illegal hacking will become more common. - Blackhat moves toward the outright illegal - DNS subdomain hijacking. Without getting DNS resolvers update, it can be hacked. Do we want to do stuff that gets people in jail?
Conclusions - blackhat SEOs will continue to veer toward the outright illegal, SEOs need to decide risk tolerance, Google will keep communicating efforts with webmasters, and Google will provide tools to help webmasters