Meet The Crawlers

Mar 2, 2006 • 12:09 pm | comments (6) by twitter | Filed Under Search Engine Strategies 2006 New York
 


Session description:

"Representatives from major crawler-based search engines cover how to submit and feed them content, with plenty of Q&A time to cover issues related to ranking well and being indexed."

Moderated by Danny Sullivan and speakers include: Matt Cutts from Google, Kashual Kurapati from Ask Jeeves, a representative from Yahoo! (Tim Mayer was not present) and Ramez Naam from MSN Search

Audience Question: This is for Google and Yahoo: My site has over 500,000 products. What is the difference in the number of pages crawled and the number of mentions? For Yahoo we only have 500 results. Why is there a difference?

Yahoo: Use the Site Explorer tool. If Site Explorer only shows 500 pages, then there is an issue. Google: Every search engine crawls in a different way. Mentions vs indexed. There are instances where we know about the url, but we did not crawl it. Your site may not have enough PageRank for us to do a deep crawl. Yahoo: The site explorer offers an option to provide a RSS feed of your site's urls.

Audience Question: Disney search marketing manager. What are the search engine capabuilitie at crawling Flash. What are the pitfalls

MSN: FLash is difficult, so it will have an effect on the findability of your site. Yahoo: Flash is in the pipe, you should see some innovations coming soon. Google: We used to parse swf files, but Flash and Ajax can be problematic. THey break functions of your browser. My recommendation is to provide a text version of your site along with the Flash version. Yahoo: Cloaking was mentioned in a previous panel as a method of getting around all-Flash sites. Don't. Danny: A Flash page is like handing out a blank business card. Shows example of a jazz singer's site (he saw last night) that uses Flash with text.

Audience Question: Do you only crawl links found on pages or does your algorithm use queries from the toolbar? MSN: It may. Google: If you're trying to use the toolbar to get indexed, you should spend your time doing something else. Other ways to get into Google without inbound links: Site submit and Google Sitemap. Audience member: I don't want certain pages indexed. Panel: Add a robots.txt file excluding those pages and also put a password on that area. Google: Gives example of how Alexa toolbar has been spoofed and used to spam Matt's "related sites" info on the Alexa listing for his blog. Yahoo/Google: How many people would be concerned if anonymous toolbar data was used by search engines? Most of the audience raise their hands.

Danny: Brings up Flash issue and points to a thread on Search Engine Watch forum.

Audience Question: Is there any truth that search engines ignore robots.txt? MSN: No, we comply.

Audience Question: Asks about submitting to MSN. Is the RSS feed url submission for MSN and Yahoo only for new content? MSN: You can submit multiple URLs to MSN and that is not seen as a spam activitiy. MSN also now supports URL submissions using an RSS feed. Yahoo: If urls are repeated in different RSS feeds they will just be revisited.

Audience Question: We use dynamic urls to control page behaviors and run into problems where the same product is indexed under different urls with different parameters. Do you have any tips on what we can do to avoid this? Yahoo: Search engines are getting better at indexing dynamic content but must be careful of spider traps. Suggestion would be to use the URL submission tools available such as Google Sitemaps or Yahoo and MSN URL submissions using RSS feed. MSN: You can use robots.txt to block everything but the cannonical version of your page urls.

Danny asking search engines to get on the same page with robots.txt. Google: The only thing we don't support is crawl delay. Many webmasters that used that parameter incorrectly. Yahoo: We try hard to adhere to the standard. Google: Google Sitemaps offers a robots.txt feedback tool. Danny: That's a great tool and I wish all the engines would do the same.

Audience Question: How does the rate at which pages get updated that are linking to you affect your site getting crawled? MSN: Refresh rate of pages pointing to you doesn't factor. What matters is the freshness of your own site. Yahoo: Inbound links are more imporant for discovery. Google: The rate change of source pages is very much a secondary consideration. MSN: Regarding links: Links that look natural, that provide value are the ones we use. Also instead of buying links, think about creating unique content that provides value and people will link to it naturally.

Audience Question: We have a competitor that builds duplicate copies of his ecommerce site and the crawlers don't seem to be able to see this. We're thinking of doing the same thing if the crawlers aren't going to do anything about it. Yahoo: The algorithms are continuously being imroved and in some cases we need to look at situations individually. Google: Agrees, feel free to provide a specific example. Fill out a spam report and give an example. We do the best we can. but we need feedback. Ask: We try to take care of these situations when we discover them.

Audience Question: How does server response time affect crawling? MSN: A slow response time can be perceived as a down web site. It may cause us to crawl you more slowly. Yahoo: We'll typically revisit the site after a few days.

Danny: Is MSN going to do anything like Site Explorer? MSN: We're very interested in improving what we can offer webmasters and will be developing tools of that nature. Yahoo: We are adding new features to Site Explorer. Ask: As we ramp up with processes and resources to deal with the queue that builds up.

Audience Question: Some major news publications will list a url but not create a hyperlink to a site. Do you use that information into account? Ask: We do assign credit for newly found sites. If there is already a link to the site, additional links to the same site are not considered. The URL as text is not treated as a link. Yahoo: At this point we do not treat a text url as a link. Google: That delves into the secret sauce. Think of coverage in major publications as a traffic source but not as a way of getting link popularity. Yahoo: Y!Q creates links automatically to popular resources.

Google: Matt shows his Google Sitemap data using then new version of sitemaps. For some reason Matt's blog is #2 for a phrase like free porn on Google local. Shows a variety of information on his blog.

Yahoo: Points out answers.yahoo.com and is looking for feedback.

SES NYC Tag:

Previous story: Earning From Search & Contextual Ads
 
blog comments powered by Disqus