Meet The Crawlers

Aug 23, 2007 • 3:16 pm | comments (2) by twitter Google+ | Filed Under Search Engine Strategies 2007 San Jose
 

Representatives from major crawler-based search engines cover how to submit and feed them content, with plenty of Q&A time to cover issues related to ranking well and being indexed. Danny Sullivan the conference Co-Chair is moderating with Peter Linsley of Ask.com, Evan Roseman of Google,  from Eytan Seidman Microsoft and Sean Suchter of Yahoo! Search are panelists.

Eytan is up first for a short presentation. He talks about their Live Webmaster Portal which includes features on how Microsoft will crawl your site. They support site map submissions and you can also see statistics specific to your web site.

They have multiple crawlers that will always begin with "MSNBot" -

- web search
- news
- academic
- multimedia
- user agent

Next he points out that they support "NOCACHE" and NOODP" tags.

Sean is up next for a short presentation on some updates with the Yahoo! crawler. One is dynamic URL rewriting via Site Explorer. Another thing is the "Robots-nocontent" tag which allows you to block access to certain portions of a web page. They have implemented crawler load improvements (reduction and targeting). New crawler has lower volume with better targeting.

Evan is up next and to start things off, he highlights Webmaster Central and explains some of its features. He suggests that you take advantage of it to submit a site map so that Google can index all your content. He also points out the Google Help Center in which they feature answers to some of the most common questions.

Finally, Peter is up. He talks about catering to the search engine robot as many times in catering to the actual human visitor, the robot is forgotten. Some problems include requiring cookies. He points out that Ask does accept site map submissions but points out that they'd rather be able to crawl naturally.

Peter uses the Adobe site to demonstrate some issues that they may have with multiple domains and duplicate content. He then uses the Mormon.org site and shows that they are disallowing crawlers to index the root page. This creates problems with crawling.

Now begins the Q&A portion of the session.

Q: First question if for Google rep. Wants to know whether they will allow users to see supplemental results within Webmaster Central now that they are no longer tagging them in search results.

A: Evan stated that being in supplemental is not a penalty but did not provide a definite answer as to whether they would allow users to discover if or not results are supplemental.

Danny interjects that all engines have a two-tier system and Eytan, Sean and Peter confirmed that. So... they all have supplemental indices but people only seem to be concerned with Google's, most likely because they used to identify them as such in the regular search results.


Q:
What can a competitor actually do if anything to hurt your site?

A: Evan says that there is a possibility where a competitor could hurt your site but did say it is extremely difficult. Hacking, domain hi-jacking are some of the things that can occur.


Q:
Question relates to scenario when you re-publish content to places such as eBay but the sites you re-publish to rank better than original. How can a webmaster identify original source of information?

A: Peter answers that one could try to get places they republish content to use robots.txt to block spidering of content. Another thing to do is have link back to original site. However on a site such as eBay, that is not always possible. The response to that is to create unique content for these sites that this person is re-publishing content on.


Q:
Robert Carlton asks if all engines are moving towards having things like Webmaster Centrals. Also asks how they treat 404s and 410s.

A: As for 404s and 410s, Ask, Google and Yahoo! treat them the same. Robert points out that they should treat them differently as a 410 indicates the file is gone whereas 404 is an error.


Q:
Question regarding getting content crawled more frequently.

A: Evan suggest to use the Site Map feature in Webmaster Central and keep it up to date. He also suggest promoting it by placing a link to it on the home page of their site.


Q:
How can one use site maps more effective for very larges site that have information changing on a regular basis? Also inquired how to get more pages indexed when only a portion are being indexed.

A: Submitting a site map with Google is not going to cause other URLs to not be crawled. Evan also points that they are not going to be able to crawl and include ALL the pages that are out there. Again suggests that webmaster promote them such as listing them on home page. However when dealing with hundreds of thousands of pages, that is not always feasible.


Q: How do engines interpret things like AJAX, JavaScript, etc.?

A: Eytan answered that if webmaster wants things interpreted, they are going to have to represent those in a format the engine can understand, AJAX and JavaScript currently not being one of them.


Q:
Question regarding rankings in Yahoo! disappearing for three weeks but then they get back in. Is his due to an update?

A: Sean answers that it certainly could be and suggests using Site Explorer to see if there is some kind of issue.


Q: How many links will engines actually crawl per page? How much is too much?

A: Peter says there is no hard and fast rule but keep the end user in mind. Evan echoes the same feeling.


Q: Do the engine use meta descriptions?

A: All engines use them and may use them if the algorithm feels they are relevant.


Q: For sites that are designed completely in Flash, can you use content in a "noscript" tag or would that be considered as some type of cloaking?

A: Sean said IP delivery is a no-no but if the content is the same as Flash, he'd rather see content in noscript than traditional cloaking. Evan suggests avoiding sites in complete Flash but rather use Flash components.


Q: Is meta keywords tag still relevant?

A: Microsoft - no, Yahoo! - not really, Google - not really, and Ask - not really. All read it but it is has so little bearing. For a really obscure keyword where it only appears in the keyword tag and no where else on the web, Yahoo! and Ask are the only ones that will show a search result based on it.


Q: How do engines view automated submission/ranking software?

A: Evan - don't use them.


I asked a Peter Linsley a question after the session regarding whether Ask is working to make their index fresher. In other words, are they working to re-index content as fast as the other engines do as typically it takes 6 months or more to get changes made to pages in the Ask index.

He said they are working on it but cannot give me any definite timeframe as to when that might be rolled out.

I also asked if they prioritize sites such as a CNN or Amazon in that changes to those sites are updated in the index more frequently than a mom and pop brochure type of a site and he confirmed that was true.


David Wallace - CEO and Founder SearchRank

Previous story: User Generated Content & Search
 

Comments:

Michael Martinez

08/23/2007 08:28 pm

I think it's interesting that Yahoo! confirms they have a Supplemental Index. Ask and Microsoft's Supplemental Indexing have been known (at least to a few of us) but I only suspected Yahoo! might be segregating content when they ran the main search "Site:" query operator side-by-side with SiteExplorer (all "site:" queries now redirect to SiteExplorer). Google needs to quit pussyfooting around the issues, however, and just let the Supplemental Index pages compete on an equal footing for queries with Main Web Index pages.

Angelina Jones

09/09/2009 11:57 am

Companies that cannot afford the expenses required to push themselves up in search engines rankings are increasingly resorting to more underhanded methods to knock down the image and reputation of their more well-funded competitors. These methods include smear campaigns, the spreading of false rumors, misleading information, and anything else that may damage a company’s reputation to the point where it puts doubt in the minds of consumers considering the purchase of that company’s products and services. A well run campaign will continue to add negative commentary over time to make it appear that there is some sort of growing movement against the targeted company. The commentary can be posted on blogs, forums, in articles, or any place else where it can be seen by consumers on the internet. Search engine optimization of the negative content can draw more viewers to it and increase its “believability” regardless of it being poorly written or its inaccuracies. The damage done, those consumers are then steered toward the sponsors of the negative content.

blog comments powered by Disqus