Duplicate Content & Multiple Site Issues

Aug 20, 2008 • 7:16 pm | comments (2) by twitter | Filed Under Search Engine Strategies 2008 San Jose

More and more site owners are concerned that they might get penalized accidentally or overtly because of duplicate content. If you run mirror sites, will search engines ban you? If you have listings that are similar in nature, is that an issue? What happens if you syndicate content through RSS and feeds? Will other sites be considered the "real" site and rob you of a rightful place in the search results? This session looks at the issues and explores solutions.

Moderator: · Amanda Watlington, Owner, Searching for Profit Speakers: · Mark Jackson, Search Engine Watch Expert & President & CEO, VIZION Interactive · Mikkel deMib Svendsen, Creative Director, deMib.com · Benu Aggarwal, Founder & President, Milestone Internet Marketing

Amanda: First up is Mark Jackson.

Mark: First we will jump in to what we are going to cover today.

-why do search engines care about duplicate content -identifying duplicate content on your site -correcting duplicate content -copycats

So why do search engines care?

Removing duplicate content allows search engines to provide variety for users. It brings up the issue of spammers, creating millions of useless pages as well as the ability to identify authority and ownership.

So how do you identify it? Google takes an automated approach to finding it and look at identifiers to tip them off, such as similar or identical URLs and title tags.

How do you find it? Do a site:domain URL search; look at pages indexed across the search engines and see if there is a large disparity between Yahoo and Google. You can also just copy a phrase on your site and do a search! Same thing with blogs.

Here's an example (screenshot) and you can see there are 9,380 copies of my article on SEW. But since SEW is showing up first, it shows they own it, were the original publisher. You can run the same search on copyscape.com.

Finding duplicate content on your own site: look for mirror web sties…how many domains do you own? Look for similar title tags, similar meta description tags, similar meta descriptions. Look for pages that are light on indexable content, i.e. ecommerce site with short descriptions tend to have duplicate content.

Look for print versions of an article or page, "email to a friend" pages. Canonicalization issues. Session IDs (multiple URLs for the same content).

If it's a domain name, redirect with 301 permanent redirect.

If someone is copying your content, first determine if it's hurting you. Are you getting the credit/link? Look at the cache date to see if it was indexed first. Determine if it is worth your time to get the content removed.

Preventative measures: have your content copyrighted. If you hire someone to write content, make sure their content is unique.

Lazy content: By industry and by geographic region. The same content just replacing all the city names.

When content is close, what do you do? Title tags, focus on first paragraph of copy.

Bottom Line: Duplicate content can hurt you. Remove, redirect, no-index. Deal with copycats efficiently and effectively. Don't be lazy.

Amanda: How many of you are in ecommerce? That's a really unfortunate place. This poses a fascination challenge. Next up is Mikkel who will deal with such issues.

Mikkel: there are unlimited ways you can create duplicate content!

1. Multiple domains – choose one brand domain. You can buy multiple, but implement a 301 re-direct. 2. Sub-domains – make sure you can only access the content through one of these domains. 3. Test-domains – always password protect so they don't get indexed. 4. Issues with www. Vs. no www. – most engines seem to be able to handle common use of both. But if not a solution is to redirect one-to-one. 5. Server load balancing – it confuses the engines, don't do it. 6. Secure and unsecure pages: http vs. https: engines often mess up with this, links do not seem to benefit both. Solution is to use full URLs on navigation links if you have both pages. Also, redirect one-to-one. 7. Session IDs – a way of storing information rather than using a cookie. The problem with that is the engines cannot handle this and every time come back, they assign a new identifier. So dump all sessions into a cookie. 8. Permalink Structure – especially if you blog using Wordpress, you can set the way you want the way you want the URL structure to appear. There is a good plugin – Canonical URL. 9. Forum issues: different threads can be part of different URLs. When you can rate a thread that will add more parameters to the URLs, now you have 2 separate pages with the same content. So do a redirect. 10. Sort order parameters: it will index the content several different times. So redirect everything to one version of the page. 11. Breadcrumb navigation – problem for shopping sites. You can end up in a situation where you get to the same product in a few different categories, and because the breadcrumb nav is replicated in the architecture, you can have the same content on two different pages.

Amanda: I work with a lot of clients that have serious content issues. You think you've solved one layer but then you find another way it's leaking through to the engines. So there are several ways for duplicate content to occur. Our third speaker will be Benu Aggarwal.

Benu: I took 3 problems that most of the businesses face:

1. Multiple domains, identical homepage, different URLs for the same content:

You can solve it in 2 ways: you can use Google webmaster to identify the primary URL, or you can do a redirect.

Multiply entry points for the same content. You can solve this easily by adding more re-write scripts.

2. Syndicating content – authenticating ownership of content.

Make sure you have easy access to edit meta-data and images. Use tools to check content, especially if you are getting massive amounts of content.

3. Website done in multiple languages.

Make sure meta data is absolutely unique in country specific sites, don't just copy and paste all the same meta tags.

Best practices to avoid duplicate content problems:

- Disallow folders in robots.txt file that have same version of site in different format for exp print friendly sites - User preferred domain setup in IIS or webmaster tools. Work on redirects. - Always use the same link to link to any page on your site. - Syndicate carefully. - Authenticate your content, use unique snippet content per page. - Check and double check your rewrites. Mange your URLs. - Avoid publishing stubs. - Use top level domains to handle language specific content.

Session coverage provided by Sheara Wilensky of Promediacorp.

Previous story: Link Building Basics
blog comments powered by Disqus