Meet the Crawlers: Submissions and Feeds Edition.

Aug 11, 2005 • 5:09 pm | comments (0) by twitter | Filed Under Search Engine Strategies 2005 San Jose
 

Moderator: Danny Sullivan Welcome focusing on indexing and submission issues.

Kaushal Kurapati - AskJeeves Brief intro of AksJeeves. They reach 25% of US audience. Crawler goals: follow robots.txt standards. We try to practice “Politeness“: be gentle to your servers, you can tell us where to crawl, not crawl. Use noarchive, noindex, no follow standards. Efficiency: compression saves bandwidth (up to 75% savings with gzip). Also avoid duplicates. Freshness: variable rates of crawling. Completeness: multiple file types: html, PDF, Flash, MS-Office, XML. Time/date stamp your content helps. Simplify site organization and navigation to ensure crawlers can reach all parts of the site. Use site maps. Watch out for infinite pages such as calendars serving the year 3001. Do not put session ID’s on URL’s. Can I submit my site for indexing? We have gone away from site submission, we are able to find site organically now. My site pages not in index yet? Patience please, various speed of crawling. there is a FAQ page for spiders. JavaScritp- parsing difficult. Dynamic pages cause for mores selection in indexing, screened for dupes before crawling. URL’s within images cannot be followed.

Debbie Jaffe - Google Will tlk about sitemaps. Help people discover more of your web pages. G site maps: what is it? Free and easy way to help G discover more about your sites. Allows for direct informing to G about site changes. Enables G to crawl site more effectively. This is a collaborative program with webmasters. Intended for all sites large and small. Web masters and users get better crawl coverage, fresher search results, and a smarter crawl. How does it work? Create a sitemap using sitemap geenrator available at G if you want (search “sitemap generator Google”) Submit a simple text file with all your URLs. Can included relative priority of pages (not relative to other pages on the web, but relative to yours. Then submit the sitemap and update as needed. Ned to setup an account as a webmaster. You can then track all of your submissions via easy to use reporting system. They think it is a great BETA program worth trying out in order to help G provide more and fresher content. Wants to add that this is just a supllement to the standard crawling occurring already.

Tim Mayer - Yahoo! It is great to see another company adopting feeds, Y! has been using these since 2001, and has great experience and good results. Overview of Y search vision enable people to find, use, share, and expand all human knowledge. Focus is on “Find.” Search not for sake ofs earching, but to achieve a purpose. Once you have found something you can share the knowledge with others. One thing people forget is to link pages from other pages. To encourage deeper crawling, would recommend not makji gsite depth too extreme (3-4 levels recommended). Use free addURL service if all else fails. Submit.search.yahoo.com/free/request. Index friendly pages: Unique content with page-specific titles and descriptions. Separate pages only when there is separate content. Multiple domains only when there is a distinct business. Avoid spam such as kw stuffing, excessive cross linking, no cloaking, Yahoo crawlers include Slurp. Seeker, Multimedia crawler and audio-video crawler. Supports c command in order to stop caching. Can also add a crawl delay to help your servers. About to launch “Site Explorer” to see how many docs are already crawled in the index. Siteexplorer.yahoo.com (something else here) announced today. Find and save SS feeds available below search results. Add.my.yahoo.com/rss to add your RSS feeds within about 48 hours, then you will see the link appearing below your listings in the future. Search Submit Pro is a paid feed programs that allows for reporting and complete control over titles and abstracts. The paid Inclusion system is an entirely different content system than the main crawl. Over 99% of the index is crawled for free. Lists a fairt amount of support links available at Yahoo, including site questions like “I think I got banned, etc… the best is yet to come…see Yahoo search Blog at ysearchblog.com and go to next.yahoo.com to see new and future products.

Q&A. “Is there a way to do the Google sitemaps type system at Yahoo?” Tim: We just launched the feed to be able to do that. We will be expanding the products into the future.

Danny asks how many are using G sitemaps seemed as if a fair amount), Yahoo! Aid inclusion? (same amount) anyone using one system to submit to both? (none-seemed surprised by that.) Fair to say that the room would encourage you all to come together and do this.

“Does the sitemap feed effect the regular crawl?

D: No it doesn’t effect that. It does allow for additional information added. . Danny asks how many people that use sitemaps have benefited from it, molst have. Only one person ahd no effect, and no one raised their hand to “negative effect?”

“How to make sure country-specific engines pick up Yahoo content?”

T: No brainer way is to get a separate domain for each. Other way would be to make sure there are inbound links from that specific country to the particular content on the site. Somebody comments that you have to live in the country to get a domain. Tim says there are some services available that can be costly that provide this sort of help.

Danny ads that if you host in a particular country…it will help. Linkage is very important, especially if Authorities such as BBC in UK, for example

D: The index being generally the same is the same thing at G.

“anything in addition to using 301’s when changing a large site and changing many URL’s. Not root, but wanted to use the top fifty pages with 301’s?”

Tim: Why changing? They are going for more search engine freindlyness. Tim wonders why change if you already have good rankings. Danny answers to the idea that many search term appear in URLs in top searches, that many top ranks do not employ that. He thinks it is porbably other factors causing the content to be indexed highly. Danny ulls up a couple searches and shows that it is more important for the kw to be in the tile than in the URL.

K: They feel that content is the most important instead of the other stuff.

“What is the determining factor of how many poages get indexed?”

D: each individual sitemap allows for up to 50K URL’s. You can out sitemaps in individual directories if you have more than 50K pages. The re is no specific quota that she is aware of.

T: The importance is high quality signals such as authoritative inbounds links, no spam. There is no one factor that can be described as the largest. There are lots of things you can do to help the crawler want to dig deeper.

K: Two things that are factored are the depht of the site as well as if you have a dynamic URL that may block the crawler.

Previous story: My SEM Toolbox
 

Comments:

No comments.

blog comments powered by Disqus