Managing Duplicate Content In a World Where Google Can Crawl JavaScript

Apr 30, 2008 • 8:01 am | comments (2) by twitter Google+ | Filed Under Database Driven & Dynamic Site SEO & Tips

Now that Google admitted to crawling JavaScript and forms SEOs and Webmasters need to be aware of how to manage even more duplicate content issues.

In the past, a good strategy was to build out filter pages (filter by color, size, price, etc.) using JavaScript pull down menus. Google would typically stay away from such forms and you would not necessarily have to worry about Google seeing the same content filtered or sorted by color, price, size and so on.

But now with Google crawling JavaScript and forms, Webmasters need to take an extra step towards preventing Google from crawling and indexing such content. Why? Duplicate content.

A WebmasterWorld thread has discussion on this topic and offers tips on what to do, to help you with this problem. Some of the advice includes:

  • Include the duplicate content in an external Js, assign it to variables, and do innerHTML to some divs.
  • Use XmlHTTPRequest (GET) to retrieve the data in XML format and then put it into the page.
  • Use an Ajax POST and retrieve the XML content with this.
  • Use robots.txt to block specific files and/or page naming conventions.
There are many ways to tackle the issue, but using JavaScript alone is no longer the best answer.

Forum discussion at WebmasterWorld.

Previous story: Google Promotes iGoogle Artist Themes



05/01/2008 04:43 pm

We have been using java script to enter 'hover over text' - when your mouse is positioned over an image, a small pop-up definition appears. I'm concerned that if the crawlers are now viewing this javascript, along with the ALT tag definitions, that this will be considered duplicate content on a specific web page. Any advice?

Michael Martinez

05/01/2008 04:49 pm

Google has been crawling Javascript for years, a fact that Matt Cutts confirms in his blog comment <a href="" target="_blank">here</a>. I know I've discussed their ability to crawl Javascript on both Spider-food and Highrankings many times through the years. Google has made no secret of the fact that it's been able to parse escaped Javascript redirects for years, either. I reported the first Javascript crawler from Stanford University back in 2001/2002, as I recall (that post has been long deleted). No one should be surprised at Google's ability to parse Javascript. This is very old news. Yahoo! has been doing the same for years. Google only seems to be treating Javascript content a little differently now.

blog comments powered by Disqus