Guide To Fixing Google Duplicate Content & Canonical URL Issues

Jan 8, 2007 • 11:51 am | comments (6) by twitter Google+ | Filed Under Google Search Engine Optimization
 

As the discussion on duplicate content drones on (to the dismay of some), it's only natural that some really excellent how to guides come about. Part of the duplicate content discussion is the best way to avoid domain and canonical URL issues when linking internally, fixing typos in URL's, and redirects on your site. WebmasterWorld has one of the best technical guides that I have seen of late on how to fix these duplicate content issues in Google that arise from some of the URL canonicalization issues.

For those that need a refresher on about URL canonicalization, it is summarized as "choosing what single domain you want to use for your site, and what single URL should be used to request each of your pages," having urls that are outside this standard can cause problems in the search engines and part of this guide instructs you how to fix that. According to jdMorgan, the following are some step you can take to ensure you are not in terrority where duplicate content would be flagged on your pages:

  • Canonicalize the domain (e.g. redirect non-www and IP address to www)
  • Canonicalize my index pages (redirect "/index.html" to "/")
  • Remove multiple slashes in the URL
  • Remove spurious query strings (my sites' pages are mostly 'static' with a few exceptions)
  • Fix-up common typos in type-in URLs
  • Fix-up invalid inbound links caused by bad HTML mark-up
  • Fix-up URLs resulting from bad copy-and-pastes
  • Fix-up outdated or otherwise incorrect query strings
  • Suppress the fix-up redirect if the resulting URL does not resolve to an existing file
  • Suppress the fix-up if the link is on my own site (In this case, I want to see the 404 error)
  • Suppress the fix-up if the remote user is me or a site tester (Again, we want to see the 404 error)
  • Avoid recursion in mod_rewrite running in a per-directory .htaccess context
  • Avoid the nasty mod_rewrite bug in Apache 1.3.x
  • Do all of the above using a single 301-Moved Permanently redirect

Definately worth a look at this thread. Continued discussion at WebmasterWorld - Guide to Fixing URL Canonicalization isssues

Previous story: SEO and Website Terminology: Clearing up Common Misnomers
 

Comments:

cvos

01/08/2007 10:51 pm

Watch capitalization in filenames. /PageName.asp and /pagename.asp can cause huge problems for spiders since they are technically different pages, even if IIS treats them the same.

Michael Martinez

01/08/2007 11:18 pm

And use the Google Webmaster Central canonicalization tool.

No Name

06/23/2008 08:56 pm

I have made a webmaster tool called Duplicate Content Fixer. You just input your domain and then choose with or without www. it creates a txt file you then copy in the .htaccess its the quickest way to deal with it.

Neil

06/30/2009 01:04 pm

An insightful article. Please explain what you mean by “Fix-up invalid inbound links caused by bad HTML mark-up” and “Fix-up URLs resulting from bad copy-and-pastes” Thanks for the tips.

Jonathan Roseland

01/05/2011 07:13 pm

Boy I'd really like learn how to correctly pronounce canonicalization! lol

kdaymayday

06/04/2012 05:46 pm

what is your site called?

blog comments powered by Disqus