Google: Content Stitching Or Quilting Is Not Near Duplicate Content

Jun 21, 2017 - 8:11 am 9 by

Google Content Stiching Quilting

Dawn Anderson followed up on a topic around what is near duplicate content with Google's Gary Illyes - asking if it is similar to content stitching and quilting. As Dawn suspected, Gary said no, it is not. Here it is on Twitter where Dawn asked "'Content stitching / quilting'... this is not the same as near-duplicate as defined in ur prev tweet?" and Gary responded that she is correct.

Here are the tweets:

Dawn then sent me some more technical information on this. She said that Marc Najork, who is now at Google, wrote a paper on this while at Microsoft named Detecting Quilted Web Pages at Scale. Here is the abstract:

Web-based advertising and electronic commerce, combined with the key role of search engines in driving visitors to ad-monetized and e-commerce web sites, has given rise to the phenomenon of web spam: web pages that are of little value to visitors, but that are created mainly to mislead search engines into driving traffic to target web sites. A large fraction of spam web pages is automatically generated, and some portion of these pages is generated by stitching together parts (sentences or paragraphs) of other web pages. This paper presents a scalable algorithm for detecting such “quilted” web pages. Previous work by the author and his collaborators introduced a sampling-based algorithm that was capable of detecting some, but by far not all quilted web pages in a collection. By contrast, the algorithm presented in this work identifies all quilted web pages, and it is scalable to very large corpora. We tested the algorithm on the half-billion page English-language subset of the ClueWeb09 collection, and evaluated its effectiveness in detecting web spam by manually inspecting small samples of the detected quilted pages. This manual inspection guided us in iteratively refining the algorithm to be more efficient in detecting real-world spam.

There is no doubt Google and other search engines are on to this type of behavior but it is always nice pointing to research papers when we can. Thanks Dawn.

Forum discussion at Twitter.


Popular Categories

The Pulse of the search community


Search Video Recaps

Google AI Overviews, Ranking Volatility, Web Filter, Google Ads AI Summaries & More - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: May 17, 2024

May 17, 2024 - 4:00 pm
Search Video Recaps

Search News Buzz Video Recap: Google AI Overviews, Ranking Volatility, Web Filter, Google Ads AI Summaries & More

May 17, 2024 - 8:01 am
Google Search Engine Optimization

Remove Your Content From Google's AI Overviews

May 17, 2024 - 7:51 am
Google Ads

Google Ads AI Summaries Live For Some Advertisers

May 17, 2024 - 7:41 am
Google Maps

Order with Google For Food Delivery Going Away End Of June

May 17, 2024 - 7:31 am
Google Search Engine Optimization

Two New Googlebots: GoogleOther-Image & GoogleOther-Video

May 17, 2024 - 7:21 am
Previous Story: Google Got An Interactive Fidget Spinner