Search Friendly Development

Jun 4, 2008 - 12:53 pm 0 by

Search Friendly Development - Highlights the most important elements to consider for search engine optimization (SEO) when building a web application infrastructure and provides tactical details about how to implement those elements. Topics include:

* Developing a crawlable infrastructure * Considerations when developing rich internet applications (using technologies such as Flash, Silverlight, and AJAX) * URL rewriting, redirection, canonicalization, and visitor tracking

Moderator: Vanessa Fox, Features Editor, Search Engine Land


Nathan Buggia, Lead Program Manager, Microsoft Maile Ohye, Senior Developer Programs Engineer, Google Sharad Verma, Sr. Product Manager, Web Search, Yahoo

Nathan from Microsoft is starting the day off and tells us the "Truth about SEO." There are a lot of big hard problems: affiliate tracking, session management, rich internet application, duplicate content, geolocation, understnading analytics, redirection, error management, etc. HTTP is a stateless protocol. With search in the mix, all designs built from 1995-2000 kind of have broken in 2005 and beyond. Cloaking isn't ideal. What differentiates advanced SEO from normal SEO is analytics. Being an advanced SEO means you have more experience or are at a larger company but you need to make sure all appropriate things are instrumented and use it for your logical thinking. Do not implement something because someone told you on a panel that it's a good idea. He doesn't recommend PageRank sculpting. But everything on the web is an opportunity cost. A competitor might be doing what you aren't doing.

Watch out for complexity as well. That's something that a lot of people get caught up on. If you build cloaking or conditional redirects into your website, it gets very complex. Multiple URLs will cause problems; you have to track 404s, getting rankings, etc. All these variations are complex. It's hard to find problems.

Look for the simplest architecture possible to solve the problem for agility.

Microsoft says cloaking is not all bad but it's not the second or third solution that they recommend. Every search engine says don't do it, though. Try it with caution.

All websites have the same first problem: accessibility. That's where people should start especially if there's no analytics in place to tell you. Can crawlers access the site? Do you have Flash or Silverlight or 301s or 302s or images? It's a simple topic but Microsoft has a team of SEOs who focus on the top websites and these "101" problems are still problematic there.

Take a look at the main content: title tags, H1 page, and does the content on the page exist? Look at canonicalization. People look for link building campaigns, but another way to approach the same problem is to look at canonicalization - do you have 5 URLs pointed to the same page that divides reputation? That's an important indicator.

Search engines are always changing. People might say that the big thing this year is "siloing." That may change next week or next year (especially if Matt is in the audience). If it works for me, it may not work for you. The webmaster guidelines are always constant. All search engines agree on the same thing. Work with us instead of against us.

He uses as an example so we cna get the first run experience. The first thing you see is that flash is loading Flash loads and you select your language, region, and then it loads again. They play a video that runs for 1 minute. The first run experience is 8 seconds to get to the video. Nike is a brilliant company, he adds. They are great at brand marketing. "Just do it" is a marketing slogan but a cultural icon as well. If you approach the web with the same immersive experience, it may not work the same. The first run experience is 8 seconds, but maybe some others don't have 8 seconds. Maybe they have 1 second. Some people like to shave off milliseconds of the page load time because that keeps the visitors there. There's 3 seconds of caching a cookie as well. If you don't have time to be immersed, it's not great. If you have a mobile device, it's not great. If you're blind, it's not great. If you're ADHD, you can't wait that long.

It's also not great for search. The HTML behind the page shows the title tag is there: it's "" They're cloaking (which is well known). The search engines see a lot of better content. Nike has over 2 million pages on their website and they're not cloaking everything but they're cloaking a lot. When their cloaking broke, people didn't notice because they weren't crawling as search engines. Cloaking is hard and complicated.

Opportunity cost and analytics: every investment you make is another investment that you can't make. If you're investing in cloaking, others may not be investing in cloaking and they may be affiliates of your company. A lot of websites for "lebron james shoes" look like Nike websites but they're not. For every problem you have on the web, there are many possible solutions. We'll talk about opportunities and options for solving these problems.

He shows an alternative implementation. You throw a rich object at the top and throw Javascript that runs the div. It's a rich level expeience and a down level experience in the same web page.

Advanced SEO does not equal spam. SEO does equal good design.

It's a lot less expensive and more impactful to plan for SEO. Design for your customers, be smart about robots, and you'll enjoy long lasting success.

Next up is Sharad Varma from Yahoo. He talks about his past visit to Peru (Machu Picchu). He talks about how the Incas built the city over a mountain but it was not discovered until 1911 becasue it was covered in dense forests. It was completely covered and hidden from the view. But today, it's easy to get to. You can take a bus or walk there. Today, it's accessible and easily discovered. That's the emphasis of this talk. You need to serve your human users and your robots. No matter how you design your site, you need to consider people and robots.

Search machinery behind the bots: there are 3 cranks (3 fundamental processes) - crawling, indexing, and ranking. You have diminished webmaster control as you go from crawling, indexing, and ranking. You have the most control over crawling and the least over ranking.

Since crawling is where you have most control, let's find out how spiders crawl your website. You start with a URL, download a website, extract links and download more webpages, and then the crawlers find invisible links (Javascript, forms, etc.), but they do more extraction. Sometimes they see links and don't crawl. Some links may be in robots.txt or they're not high priority enough according to search engines.

How to search engines find your content? - Organic inclusion from crawling but it depends on links from reputable sites - If you're not satisfied with the amount of crawling, there are feeds that let you submit the content.

Organic crawl: - Search engines are taking baby steps to understand Javascript and cannot crawl. Turn off JS on your browser and navigate your site. That's how you can tell if it works. (e.g. search with JS turned off).

In Flash, make sure your site is accessible by robots. Provide alternate navigation.

In dynamnic URLs (many parameters), the biggest thing is that they're difficult to read. Usability standpoint - they don't create a rich experience. They also lead to duplicate content and spider traps. Instead, create human friendly readable URLs. Use 301 redirects for dynamic URLs to static versions. Limit the number of parameters. Rewrite dymanic URLs using Yahoo! Site explorer. He says that someone actually won't visit sites with complex URLs. I can understand that.

He explains how to use Yahoo Site explorer to rewrite dynamic URLs where you can remove up to 3 parameters and then set up a default value. To get there, log into Site Explorer, go to Manage (for the domain), and then click on Dynamic URLs.

Duplicate content is essentially multiple URLs leading to the same content. The consequences are less effective crawl and less likely to extract links from duplicate pages. You can 301 duplicate content to the canonical version or disallow duplicate content in robots.txt.

Best practices: - Flatten your folder structure (e.g. instead of - Redirect old pages to the new pages with 301 and 302 - Use keywords in URLs - Use subdomains when appropriate - Remove file extension from URL if you can - Consistently use canonical URLs for internal linking - Promote your critical content closer to the home page

Feed based crawling - the sitemaps protocol. Tell the crawler where to find all the pages on your site, especially deep content. See Try to use all the metadata that is supported by the protocol.

Tabasco is Yahoo's secret sauce.

Robots exclusion protocol lets you tell search engines to crawl and not to crawl - printer friendly, duplicate content, folders that you don't want users to see (exception: CSS). The crawl-delay is another tool you can use - the number of times the crawlers will visit per this value. If the delay is 900, the crawler will visit your site once every 900 seconds. You want to be careful about this. It supports fractions. You can start with a lower value and then amp it up if necessary based on behavior.

Robots protocol also supports - noindex, noaarchive, noslippet, nofollow. nocontent, noydir, and one other one that I didn't get.

SiteExplorer is very useful to use to explore information and inlinks, sitemaps, etc.

Search engines want your content. Break down accessibility barriers, let the crawlers in, and they'll do their job.

Maile from Google shows how to enhance your website. The crawl section shows you how to maximize your site's accessibility to search engines, indexing, and search results (pretty results that people will want to click on)

Crawlable architecture: - When you first design a site, you want to start with progressive enhancement. You don't begin everything with Flash. Start with HTML, links, navigation, and then start adding fancy bonuses like AJAX and Flash. It's a complement to your site and not in lieu of it. This reduces dilution of PageRank when sharing links between Flash and non-flahs version.

A site that is rich in media but does things very well: YouTube. There are videos, site navigation, title, descriptive content along side rich media.

When you use flash, Google approves of sIFR (the idea that if you have Javascript, it detects if Flash is installed and it replaces the text if there is). Takeaways: the text matches the content seen by enabled users. It's not for search engines but also for users who use screen readers.

You might be using AJAX: use Hijax. If you want a user to click on foo=32, you're also creating a static HTML link. Search egines often ignore fragment (ajax.html#foo-32) but respect parameters (?foo=32).

Webmaster Tools can be helpful, the help center, the webmaster blog, and the discussion group. - It includes the crawl pipeline: crawl errors in Webmaster Tools. - You also can see the internal link structure. Verify that links are findable.

Now let's look at indexing. Promote your crawl content. Preferred domain should be www or non-www. Figure out what you want. She uses as an example. It dilutes PageRank.

Affiliate ID and tracking IDs cause duplicate content. Keep it as clean as possible. Internally link to a canonical version. Store the information in a cookie. When you do something like this, you see a session file with cookie data. Yahoo mentioned you have dynamic parameters in Site Explorer. Google uses sitemaps as that alternative. In the location field, put the canonical version there and not the affiliate link.

You can also have video sitemaps on Google with title, description, and a thumbnail for those who have rich sites. There's also code search, mobile, and news sitemaps.

Sitemap submission also gives you index stats.

Response codes: - USe 301s for permanent redirects. Signals search engines to transfer properties from the source to the target. It's important if you're modifying URL structure or moving sites to a new domain. - Serve a 503 if you're bringing your site down for maintenance. Don't serve a 404!

For search results, there's the title, a snippet, and then a URL and then there are sitelinks. You can't control sitelinks (they're determined algorithmically). - Create unique and informative titles. Every title is a good signal to users as to what the URL's contents are. Don't show "untitled." Webmaster Tools will show you title tag issues. - Snippets provide the user with more context for search results. The quality of your snippet can impact your clickthrough. Some snippets are fairly long-winded or cryptic. How can you use this? Meta descriptions! They are utilized by Google. Focus on title tags first and then meta descriptions. Utilize Webmaster Central, not just Webmaster Tools. It's such a hot topic - there are posts on the blog with fresh information about security. You undo a lot of great work if you get hacked. For small sites, there are security checklists and there's a recovery list for when you get hacked (so you can be reindexed more quickly).


Popular Categories

The Pulse of the search community


Search Video Recaps

Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: May 27, 2024

May 27, 2024 - 10:00 am

In Face Of AI Overview Backlash, Google Updates Docs With How To Show Web Only Results & How To Give Feedback

May 27, 2024 - 7:51 am
Google Search Engine Optimization

Google's John Mueller Blasts The Concept Of Toxic Links, Again

May 27, 2024 - 7:41 am
Google Search Engine Optimization

Some Reporting Fewer Links Reported In Google Search Console

May 27, 2024 - 7:31 am

Google Images "See Exact Matches" Helps You Find Who Stole Your Images

May 27, 2024 - 7:21 am

Google Mobile Tests Large Blue Visit Button On Search Results

May 27, 2024 - 7:11 am
Previous Story: How Do You Delete or Remove Your Content from RSS Results Like Google Reader?