This is Danny Sullivan's pet session. He introduces the session as talking about the issues with link spam and other types of spam. Danny said he wanted a noindex tag for a specific sections of the page. Instead of the nofollow tag. Matt Cutts spearheaded the nofollow tag. He discussed the forum thread on this. On the panel is Ask Jeeves, MSN, Google and Yahoo!. By the way, I have Kim Krause & Bill S. on my right from Cre8asite, randfish, orion, Mike Grehan and Christine Churchill on my left.
Matt Cutts from Google was up first and showed a slide of guest-book spam, he explains that this link is not a true vote. What they needed is to allow webmasters to mark up links on their site to say "I did not vouch for this link." Danny then had an indexing summit article and then they contacted Weblog companies, then asked Yahoo and MSN and Ask Jeeves for suppore (MSN & Yahoo supported it). It has only been 6 weeks since it has been implemented and they have already seen a positive impact. He then shows the no follow tag which looks like <a href="http://www.example.com/" rel="nofollow">discount pharmacy</a>. He then showed about 20+ companies (search and blogs) that support this tag. They have already seen positive impacts. Its better then not having it he said. Spammers hate it he said, just like wearwolves hate silver bullets (I believe he made a comment towards Nick Wilson about his blog and spammer followers hating it - Nick, eat that up please). Spammers are shifting towards different types of spam. Spammers are moving toward smaller blog packages. Better lines of communication with software makers and search engines. Yahoo hosted a web spam "squashing" summit last week. We're open to future cooperation.
Tim Mayer from Yahoo! was up next with his "Comment Spam Proposal." He said Yahoo! came up with a slightly different proposal then Google. Yahoo! just rolled out support for the nofollow tag LAST NIGHT, so see changes in the index shortly. He talked about the summit they held at Yahoo! and said it was weird having Matt Cutts on the Yahoo! campus. The key thing is to solve the exploitation of publicly modifiable areas on prominent sites. He says the nofollow is not a semantic tag, its not descriptive of the content. Yahoo! recommends blocking of certain components of the pages. They are proposing <div class='content-public'>...</div> Content within the tag is publicly contributed by anyone. So he showed you should put this tag for blog entries. Additional ones are <div class='content-nav'>...</div> and <div class='content-default'>...</div> He then highlights the SEW site and highlights the nav and ads and said, you would block out those. He said there is also the possibility of using link level tags (more granular control), <a href="..." rel="content-public">. That is the Yahoo! proposal.
Kaushal Kurapati from Ask Jeeves was next up, remember Ask did not join forces with Google, Yahoo, and MSN. He gives a brief overview of Jeeves and how Ask Jeeves works. Crawler goals: (1) follow robot.txt standard (2) politeness; crawl delay, noarchive, noindex, nofollow; (3) efficiency - use compression methods and do not crawl duplicate pages. Indexing overview: they index html, pdf, flash, ms-office, etc., freshness through date stamping content, and completeness is important (site maps help). Some generic tips on how do use links and content. Challenges include; JavaScript, Dynamic Pages, and Long Pages. They removed the paid site submission. They say, don't buy links, it wont help. Do park domains help, nope. They want unique content. The trends for Jeeves; personal indexing with My Jeeves which is a personal crawl (in a sense). Social tagging, how people collectively refer to a page and more fodder for indexing.
Eytan Seidman from MSN Search was last up. He was not asked to bring slides. So he is running off some notes. They support nofollow starting about 2 weeks ago. They have full support on robot.txt and crawl delay. They first think about "discovering content" and can they leverage RSS to better discover new content. Once they have the content, how do they do a better job of interpreting that content? He said in email spam, there is a community approach to blocking it, can we do the same in web spam? The last thing is that people in forums have been asking for more tools to see what was indexed and not indexed and why. Please send feedback via the results page or contact page and keep it coming they are reading....
Q & A: Danny asked Qs to audience (percentages are all my estimates, I wonder if Danny got the same numbers): - 40% in the room said they want better support of 301s - Most said they want more feedback about their site, support. - 10% Express indexing - 10% Many tools are stripping out referral info (toolbars) - 90% Duplicate content handling - 20% Domain identify, i have 50 domain names all to the same place - 40% Weather reports, tell us when your changing the algorithms - 0% robot.txt more standardized - 0% on finding search result pages on the search results - 5% nofollow stuff - 2% on dynamic url issues - 0% trusted dates (page date stamping) - 50% feel meta data should come back, is it coming back, should engines now support it more - 40% are in favor in web spam reporting
Q: I asked a bit about block level link analysis based on Yahoo!'s proposal. A: Tim Mayer said they are moving somewhat in that direction.
Q: Nacho asked, how do we authenticate your crawlers? Sometimes people spoof the crawler. A: Tim Mayer said that you can authenticate via the IP address. Ask Jeeves agreed. Um, hire fantomasters's ip list.
Q: How about a relevance authority tag system? Like eBay reviews, etc. An independent score, authenticate. And then you want to quantitatively score that. And then a qualitative assessment. A: Interesting ideas.
Q: Webby asked the next question. He thanked Google for the nofollow tag. But he still gets spam. Can you put a logo on a page that shows its a nofollow tag. A: Matt said it will take time and people will learn.
Q: How do the crawlers actually treat the nofollow tag? A: Matt said its a good question. He said, this is a vote abstain. Google specifically does not allow its crawlers to follow those links AT THIS TIME. Tim adds that when the agreement was made, the engines did not decide on the behavior of the engines. Tim didn't answer the question, I think he didn't know.
Q: Danny Sullivan said its sometimes easier to send you a plain text document of the page instead of tagging everything (hinting cloaking). A: Tim said its a trust issue. MSN adds that with a programmatically method its allows the engines to determine the content of the page and not the publisher.
Give me my "Link Love" - Matt Cutt's quote. Classic statement.
There were probably three questions about "commenting out the navigation". Its not commenting it out. Its basically telling the engines where you navigation is. So now it can be used to better determine the content on the page versus the crawling of the page (links).