Meet the Crawlers

Apr 26, 2006 • 1:54 pm | comments (1) by twitter Google+ | Filed Under Search Engine Strategies 2006 Toronto
 

Moderator: Chris Sherman, Executive Editor, SearchEngineWatch.com

Google Sitemaps Launches New Features Shiva Shivakumar from Google to talk about Google Sitemaps new launch. A live demo of site maps new features. He logged into his account and showed the "my sites" section. The pages look new, there is a "diagnostic" tab that shows summary data, including indexing summary, potential indexing problems, and so on. The tab on the left, gives you more detailed information, it looks like they moved those links from a sub tab at the top to the left hand side. Google now shows you that "no pages from your site is in the Google index" for constitution.org. He goes to the site and shows at the bottom of the page, hidden text - and that is the reason Google shows the message "no pages from your site is in the Google index." This is pretty big stuff. He then moved back to an other site, showing "statistics" main tab and showed "query stats," "crawl stats", "page analysis," and "index stats." He then clicks on the "sitemaps" main tab, and pulls up google.com/sitemap.xml to show the XML document. He then clicked on "robots.txt analysis" under the "tools" section on the left hand side. It allows you to see if you will be crawled or not.

Stephen Evans from MSN Canada. New products; windows desktop search, refreshed user interface, MSN local search beta, windows live search beta, crawling images and news and more. As much as possible MSN Search will attempt to crawl and index pages that help the user find what they are looking for. Basics; build a site map, use robots.txt, be conscious of URL length, query parameters, session variables, beware of text in images, unique content, links to your site or submit your URL, nothing can replace high quality content. Also use descriptive titles, redirects (HTML redirects are best, 301 or 302 are hard), JavaScript, page weight (150KB) and canonical domain. Things to avoid; keyword stuffing, duplicate copies, cloaked content, hidden text and link farms.

Andy Renieris from Yahoo! Canada Search goes over the vision, find, use, share and expand... How to get into the index, link new URLs from existing page in index, make sure all URLs have an inbound link, good authoritative links, don't make site depth too extreme, or use free add URL. Index friendly pages are unique content and avoid spam... French sites, use french meta tags and meta descriptions. He puts up the classic "how yahoo handles redirects" slide. URL rewriting is important, parameters often changed to pseudo-paths, remove session ideas, limit the depth of the URL. He showed the yahoo crawlers, web, shop, audio, news, etc... Recent Yahoo Additions; Site Explorer (not so new); rss and atom feed submission support, ping interface via API, added internal link filter and more things coming to Site Explorer soon. They also have My Web 2.0, the save to my web button... They just did an index update on April 21st.

Kaushal Kurapati from Ask.com who goes over the stats... #6 US web property, 28.5% reach, 48.8 million domestic unique users, 5.9% share of US searches and a division of IAC. Crawler Goals; follow robots.txt standards, politeness (crawl delay, noarchive, noindex, nofollow), efficiency (compressions, avoid duplicates), freshness and multiple file types (html, pdf, flash, ms-office, etc.). (Barry notes; when did they add "nofollow?) Date-stamp content, it helps, so put a "last modified" stamp on your pages. Simplify site-organization and navigation, ensure crawlers can reach all parts of site, use site maps. Watch out for infinite pages, calendars (year 3001) and session IDs. Crawler challaneges; javascript, dynamic pages, image with urls.

Previous story: Targeting Search Ads By Demographics & Behavior
 

Comments:

karl

10/11/2007 03:06 am

Thanks for the information, it is very useful. I visited a site and it's also cool, I'd like to share this site with you www.2-clicks-stamps.com , please check this one out.

blog comments powered by Disqus