Duplicate content is not a fun topic for many SEOs and Webmasters. When it comes to dealing with large dynamic sites and dealing with how they paginate content, it makes things a bit more complicated. Pagniation of articles and even worse, pagination of the navigational elements (categories, sub category pages) to those articles can be a real hair puller for many webmasters. So what do you do?
I spotted one of the most detailed threads on this topic with some of the brightest SEOs and even a Googler discussing the various options. The Google Webmaster Help thread was started by Branko Rihtman (aka @neyne) from the SEO Scientist blog and Googler JohnMu came in to help. Not only that, former Googler and supreme SEO Vanessa Fox came in to help and ask more questions as well.
I won't get into the details, but I will summarize what is going on.
Branko gave an example of a site that paginates articles for monetization purposes. So instead of the article being on a single URL, it is broken up into several URLs, and goes from page 1, to page 2, to page 3 and so on. There is also a printer friendly URL, which is currently blocked from spiders crawling it. The thing is, he wants all the links to flow to the page one of the article, even if someone is linking to the printer friendly version or linking to page 3.
JohnMu tells him he is doing the right thing, that ultimately it is best to have all the content on a single page, but if that is not possible, he is doing the right thing. JohnMu gets into the pros and cons of various methods, such as the canonical tag, and other methods. It is an excellent thread just there.
But then Vanessa Fox comes in and talks about the challenges of breaking this concept into the navigation menus or filter lists. Paginated category pages, sub category pages and so on, with additional filter items. It is a fun topic for SEOs, at least the ones that love getting their nails dirty.
Let me pull out the key things I found interesting in the thread:
(1) Branko said he heard Google's Maile say at a conference "that in order for canonical to be counted, pages must have a sufficient amount of duplicate content and I am not sure it would work in this case."
(2) Even though Google says "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages." JohnMu seems to still recommend this action in this case. John said, "It probably makes sense to prevent a print-version from being indexed (as it is now), but past that I don't think it's necessary to tweak it further with a rel=canonical."
(3) John does recommend using the "rel=canonical on the individual pages (pointing to the preferred URL for that particular page) to prevent duplicate content such as that found in the last."
(4) Vanessa "agrees that a "view as single page" option is a good one for an article split into parts. I can see how some content owners wouldn't love it."
(5) For the paginated navigation/category like pages, I can't summarize all the good conversation in the thread. So check out the thread at Google Webmaster Help.
Personally, if it was my site and I had to split up the articles, I'd use CSS element to keep all the content on the same page. I'd have pagination just flip the content to page two, but not change the URL. I'd also make a printer friendly CSS element, so the URL is still the same even on the printer friendly page. But sometimes SEOs can't force that on clients.
Anyway, this thread is a must read for all SEOs.
Forum discussion at Google Webmaster Help.