Amazon's Cloud Spamming Google With Google's Cloud?

Feb 14, 2011 • 8:40 am | comments (7) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Spotted via WebmasterWorld, a search for [Amazon s3 Forum]'s top two results actually come from Google's comparable cloud hosting service, Google App Engine. This is confusing, so let me step back.

Amazon is big into cloud computing services, one of their services includes Amazon S3, which is popular for image and video hosting. Now, if you search for Amazon S3's forum in Google, you'd think Google would show you the result for forums.aws.amazon.com, but no. Instead, Google shows you a cloned, apparently scrapped version, of that forum's content on Google's own cloud product named Google App Engine at appspot.com.

Here is a screen shot:

Scraped Results in Google

Compare the results on Amazon's domain to that on appspt.com:

Scraped Results in Google

Scraped Results in Google

Don't they look very similar?

It is funny that Google recently made a big to-do over their scraper algorithm finding issues like this and then to see Amazon's content for their own cloud service being scraped and spammed onto Google's cloud service on Google.com's search results.

Why might this be happening? If you think about it, cloud services like these are used to house duplicate versions of your content for scalability. Maybe, just maybe, Google is more lenient with those services in the scraper algorithm.

Or maybe I have no clue what I am talking about - which is very likely here.

Forum discussion at WebmasterWorld.

Previous story: Google Finally Resolves Google Places Respond To Reviews Bug
 

Comments:

dz0ny

02/14/2011 03:02 pm

The http://buyitnw.appspot.com is a anonymous proxy service you can type http://buyitnw.appspot.com/www.cnn.com or http://buyitnw.appspot.com/www.seroundtable.com

Colin McDermott

02/14/2011 04:03 pm

Wow, Amazon forum robots.txt fail! User-agent: Googlebot Disallow: /

Barry Schwartz

02/14/2011 05:06 pm

Ugh, that is bad.

Ryan Tate

02/14/2011 05:28 pm

But I would assume they still want the content searchable on Google search engine. User-agent: Googlebot Disallow-Scraping: / Ha!

Mark Barrera

02/14/2011 11:55 pm

Yeah, Amazon missed the boat on this one and Google is doing the right thing in this case since Amazon is blocking them from the content.

Johan

02/15/2011 01:34 pm

Very strange. Even now the site scraper mentioned in the article is in the top 600 sites on the internet. Should we all be scraping sites to get that kind of visitors. Check out: http://www.webstatschecker.com/stats/domain/buyitnw.appspot.com

Coding Strategist

02/15/2011 02:22 pm

Thanks for pointing this one out. I am not sure yet what to think about it, but it does seem like an algorithmic glitch on Google's part. You would not think they would like to include anonymous proxy services in their index. These services will only show duplicate content as the only thing they do is to serve existing content from a different IP / domain.

blog comments powered by Disqus