Cache Pages are Not Duplicate When Not Indexed

Jun 29, 2005 • 11:33 am | comments (0) by | Filed Under SEO - Search Engine Optimization

There is so much fear today with the term "duplicate content" that is really is disturbing. Many newbies in the SEO game don't fully understand it. Basically, duplicate content is when two or more pages in a search engine index are similar enough to trigger a filter. If page A and page B are close in content (how close is the question), page A might be filtered out of the results, or page B might be filtered out of the results. The filtered page has a lot to do with which page has less link popularity. So a page with more linkage weight, will probably not be filtered. So if you write an article and it is syndicated on a site that has more linkage weight, your original article might be filtered from the results. We had several dozen 'mentionings' of "duplicate content" at this site. More specific articles written by myself an other authors include (and make sure to note the date they were written); What is Duplicate Content by Aspen, Duplicate Content Penalty Timespan by Phoenix, and Duplicate Content - Resellers Ranking Higher by myself.

Now that we got some of that behind us, a WebmasterWorld thread named Google Lists its Own Cache Pages shows how people are so frustrated and nervous about such a filter. The bottom-line with that thread is that the Google Cache shows an exact duplication of a page (hence 'cache'). But since Google disallows bots to access it, search engines won't index it. If the search engines won't crawl or index the content, they will not even know it exists. If they do not know it exists, it is not considered a page and wont be considered duplicate.

Don't get me wrong, there is a lot to worry about with duplicate content. Both with internal database driven content and syndication of content. Search engines are constantly tweaking the algorithm and 'duplicate content filters' to adjust and see what works best. It should improve over time. I will follow up this entry with a new thread that helps with some of the illegal syndication issues.

Previous story: Trademark Laws & Technology Gone Too Far
Ninja Banner
blog comments powered by Disqus