Meet The Crawlers

Aug 23, 2007 - 3:16 pm 2 by

Representatives from major crawler-based search engines cover how to submit and feed them content, with plenty of Q&A time to cover issues related to ranking well and being indexed. Danny Sullivan the conference Co-Chair is moderating with Peter Linsley of, Evan Roseman of Google,  from Eytan Seidman Microsoft and Sean Suchter of Yahoo! Search are panelists.

Eytan is up first for a short presentation. He talks about their Live Webmaster Portal which includes features on how Microsoft will crawl your site. They support site map submissions and you can also see statistics specific to your web site.

They have multiple crawlers that will always begin with "MSNBot" -

- web search
- news
- academic
- multimedia
- user agent

Next he points out that they support "NOCACHE" and NOODP" tags.

Sean is up next for a short presentation on some updates with the Yahoo! crawler. One is dynamic URL rewriting via Site Explorer. Another thing is the "Robots-nocontent" tag which allows you to block access to certain portions of a web page. They have implemented crawler load improvements (reduction and targeting). New crawler has lower volume with better targeting.

Evan is up next and to start things off, he highlights Webmaster Central and explains some of its features. He suggests that you take advantage of it to submit a site map so that Google can index all your content. He also points out the Google Help Center in which they feature answers to some of the most common questions.

Finally, Peter is up. He talks about catering to the search engine robot as many times in catering to the actual human visitor, the robot is forgotten. Some problems include requiring cookies. He points out that Ask does accept site map submissions but points out that they'd rather be able to crawl naturally.

Peter uses the Adobe site to demonstrate some issues that they may have with multiple domains and duplicate content. He then uses the site and shows that they are disallowing crawlers to index the root page. This creates problems with crawling.

Now begins the Q&A portion of the session.

Q: First question if for Google rep. Wants to know whether they will allow users to see supplemental results within Webmaster Central now that they are no longer tagging them in search results.

A: Evan stated that being in supplemental is not a penalty but did not provide a definite answer as to whether they would allow users to discover if or not results are supplemental.

Danny interjects that all engines have a two-tier system and Eytan, Sean and Peter confirmed that. So... they all have supplemental indices but people only seem to be concerned with Google's, most likely because they used to identify them as such in the regular search results.

What can a competitor actually do if anything to hurt your site?

A: Evan says that there is a possibility where a competitor could hurt your site but did say it is extremely difficult. Hacking, domain hi-jacking are some of the things that can occur.

Question relates to scenario when you re-publish content to places such as eBay but the sites you re-publish to rank better than original. How can a webmaster identify original source of information?

A: Peter answers that one could try to get places they republish content to use robots.txt to block spidering of content. Another thing to do is have link back to original site. However on a site such as eBay, that is not always possible. The response to that is to create unique content for these sites that this person is re-publishing content on.

Robert Carlton asks if all engines are moving towards having things like Webmaster Centrals. Also asks how they treat 404s and 410s.

A: As for 404s and 410s, Ask, Google and Yahoo! treat them the same. Robert points out that they should treat them differently as a 410 indicates the file is gone whereas 404 is an error.

Question regarding getting content crawled more frequently.

A: Evan suggest to use the Site Map feature in Webmaster Central and keep it up to date. He also suggest promoting it by placing a link to it on the home page of their site.

How can one use site maps more effective for very larges site that have information changing on a regular basis? Also inquired how to get more pages indexed when only a portion are being indexed.

A: Submitting a site map with Google is not going to cause other URLs to not be crawled. Evan also points that they are not going to be able to crawl and include ALL the pages that are out there. Again suggests that webmaster promote them such as listing them on home page. However when dealing with hundreds of thousands of pages, that is not always feasible.

Q: How do engines interpret things like AJAX, JavaScript, etc.?

A: Eytan answered that if webmaster wants things interpreted, they are going to have to represent those in a format the engine can understand, AJAX and JavaScript currently not being one of them.

Question regarding rankings in Yahoo! disappearing for three weeks but then they get back in. Is his due to an update?

A: Sean answers that it certainly could be and suggests using Site Explorer to see if there is some kind of issue.

Q: How many links will engines actually crawl per page? How much is too much?

A: Peter says there is no hard and fast rule but keep the end user in mind. Evan echoes the same feeling.

Q: Do the engine use meta descriptions?

A: All engines use them and may use them if the algorithm feels they are relevant.

Q: For sites that are designed completely in Flash, can you use content in a "noscript" tag or would that be considered as some type of cloaking?

A: Sean said IP delivery is a no-no but if the content is the same as Flash, he'd rather see content in noscript than traditional cloaking. Evan suggests avoiding sites in complete Flash but rather use Flash components.

Q: Is meta keywords tag still relevant?

A: Microsoft - no, Yahoo! - not really, Google - not really, and Ask - not really. All read it but it is has so little bearing. For a really obscure keyword where it only appears in the keyword tag and no where else on the web, Yahoo! and Ask are the only ones that will show a search result based on it.

Q: How do engines view automated submission/ranking software?

A: Evan - don't use them.

I asked a Peter Linsley a question after the session regarding whether Ask is working to make their index fresher. In other words, are they working to re-index content as fast as the other engines do as typically it takes 6 months or more to get changes made to pages in the Ask index.

He said they are working on it but cannot give me any definite timeframe as to when that might be rolled out.

I also asked if they prioritize sites such as a CNN or Amazon in that changes to those sites are updated in the index more frequently than a mom and pop brochure type of a site and he confirmed that was true.

David Wallace - CEO and Founder SearchRank


Popular Categories

The Pulse of the search community


Search Video Recaps

Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: May 24, 2024

May 24, 2024 - 10:00 am
Search Video Recaps

Search News Buzz Video Recap: Google Ranking Volatility, Ads In Google AI Overviews, Sundar Pichai Interview, Heartfelt Helpful Content & More Ad News

May 24, 2024 - 8:01 am
Google Search Engine Optimization

Google: The Site Reputation Abuse Policy Enforcement Not Yet Algorithmic

May 24, 2024 - 7:51 am
Google Search Engine Optimization

Google Search Can Now Index Electronic Publication (EPUB)

May 24, 2024 - 7:41 am

Directory Of Embarrassing Google AI Overviews

May 24, 2024 - 7:31 am
Web Analytics

Google Analytics Real-Time Reports Adds Users In Last 5 Minutes

May 24, 2024 - 7:21 am
Previous Story: User Generated Content & Search
Next Story: Buzz Monitoring