Cloaking / IP Delivery Archives

Is Amazon.com Cloaking : Serving Google Different Content?

A WebmasterWorld thread has someone asking if what Amazon is doing, is considered cloaking. First let me show you what they are doing.

Go to this page on Amazon.com. Notice, at the top right of the results is a "Sort by" drop down. Here is a picture:

Amazon Cloaking?

Now, go to the Google Cached version of that page:

Amazon Cloaking?

Yes, it is not there. It seems like Amazon served Google the page without the sorting feature, which technically is a form of cloaking. But is this done with intent to artificially boost their rankings?

The content seems exactly identical, minus just the sort by feature. The sort by feature can produce duplicate content and make it harder for Google to index the incredibly large site. Is this being done in a way that is trying to hide something?

If they really want to be white on white - they could now keep the sort by there and use the canonical tag for all the sort by options. But this might be something Amazon has doen for years prior to the canonical tag coming out?

Personally, I wouldn't consider this against Google's cloaking guidelines, but I am not Google.

Forum discussion at WebmasterWorld.

posted rustybrick in Cloaking / IP Delivery at January 13, 2010 7:58 AM Comments (3)

Stop Spiders From Crawling Your Site on Shabbat, Including GoogleBot

A Google Webmaster Help thread has an interesting discussion around blocking your site from coming up for both visitors and search engine crawlers on Shabbat (the Jewish Saturday). This is not a new topic, we discussed using cloaking for religious Shabbat purposes in the past.

In short, some observant Jews do not want their site to be accessible on Shabbat, which is sundown Friday night, to nightfall Saturday night. The issue on the SEO front is if you turn off your site, then what happens to the search engine crawlers? Do they get 404 pages and drop your site from the search index?

Phil Payne posted an answer to how one can handle this, which Googler JohnMu said was a good answer. Phil said:

Yes - a 503 is the correct server response for "We're closed". If you substitute a normal HTML page saying "We're closed" and serve a 200 it's very likely to get indexed by Google.

If you give the Googlebot a 503, it will just go away and come back later without indexing what you give it.

For humans, you can serve a custom 503 page that explains the situation. Are there no other Orthodox sites you can ask, to see how they do it?

Now, Friday night here, is not the same as Friday night by you. So detecting the location of a visitor is key here. There are services like Saturday Guard that do this for you, but I am not sure how they handle search bots.

Technically, the issue, as far as I understand it (I am not a Rabbi, but I am an observant Jew) is that they do not want to earn money on Shabbat or Jewish holidays. Some hold that since the money doesn't transfer from the merchant account to the bank that day, then there is no money being earned technically that day. But some do not hold that way or some want to be extra careful. If it is a matter of money, then just turn off the "add to cart" and shopping cart features for the site.

If they do not want any activity on their site by potential customers, then I guess a 503 is a good answer. But are search engine bots customers? No. I suspect, most Rabbis would be okay with spiders or automated crawlers using the site on Shabbat. The issue then is, are you allowed to serve up a 503 page to a visitor and not to a crawler - that might be against Google's terms of service and fall within the bad cloaking policies.

If the issue is about the server actually working on Shabbat. Then a 503 cannot really be served up at all, because you would technically need to power down the server and without a server to send the 503 response code - then you got nothing.

This is a complex issue that I personally never had to deal with on sites that we have built. But it would be interesting to see what to do in the case of turning off a web server. There isn't much Google can do here.

Forum discussion at Google Webmaster Help.

posted rustybrick in Search Engine Optimization at September 9, 2009 8:57 AM Comments (3)

Google's Matt Cutts On Using Different Interfaces For Mobile Users

The topic of cloaking or IP deliver or useragent delivery is always a very touchy topic in the SEO industry. I am not going to get into the history, but in short, webmasters can use various methods to show GoogleBot one piece of content and the user a different piece of content. Now, there is a gray area in that space. For example, hiding certain links or content from GoogleBot, while showing it to searchers, at the same time, showing the primary content to GoogleBot. That is why this is a touchy topic, Google wants to take a hard stance against cloaking and forms of it, but at the same time, there are very valid reasons for it.

In a recent video by Matt Cutts he discusses why showing a mobile version of a web site is 100% okay by Google. In short, as long as you show GoogleBot the same site normal web browsers see, then you are okay. Having a mobile version or print version of your site is fine, just don't show it to GoogleBot. Here is the quick video:

A week or two ago, I go through, in detail, how I implemented this for my corporate site. You can read about it over here.

Forum discussion at Google Webmaster Help.

posted rustybrick in Google Optimization at September 4, 2009 8:30 AM Comments (0)

Case of Accidental Search Engine Cloaking

Gab Goldenberg from SEO ROI posted a thread at Google Webmasters Help discussion forums when he noticed Google began to drop his rankings for all his pages. I have been tracking the thread for about three weeks, but held off on posting about it until Gabs wrote his post.

How I Cloaked My Way To LOWER Rankings was Gab's title and it shows how you must be careful about who you request site changes from. One WordPress plugin conflicted with a custom plugin and it turned out to generate cloaked-like pages in the eyes of Google. Google would see virtually the same title, content, etc for all the pages on the site, while a human would see unique pages.

Clearly, this type of duplicate content issue, with cloaking, is a recipe for disaster in the search engines.

Thanks to the good help of the folks at Google Webmasters Help and for Googler JohnMu pointing out the issue, Gab is able to address the issue and fix his site, which will fix his rankings.

Lesson learned: Test your changes in a test environment and try your best to understand how the changes will impact your pages. Some times things slip through, like in this case, but don't be afraid to ask for help.

Forum discussion at Google Webmasters Help.

posted rustybrick in Cloaking / IP Delivery at May 14, 2009 8:17 AM Comments (0)

Webmasters Skeptical But Loving New Canonical Search Engine Tag

Yesterday, Google, Yahoo and Microsoft announced together a new way to handle internal duplicate content issues with a new "canonical" header tag. Vanessa Fox does an excellent job explaining what it is all about in her piece at Search Engine Land.

So for all duplicate pages, you insert this tag in the header elements of those pages, specifying the main URL. The tag looks like this:

<link rel="canonical" href="http://www.example.com/true-url.html" />

Google, Yahoo and Microsoft have detailed explanations of how they work.

Three main things:

(1) This works only internally, not across domains.
(2) Treat this like you would a 301 redirect, so be careful
(3) Search engines consider this a "hint" and do not have to abide by it (just yet)

Outside of that, there is good recaps on this at Techmeme.

We have a ton of Q&A on this from our live coverage of the Ask the Search Engines panel from SMX West. I am sure your questions are answered in that panel or in the discussions below.

This tag can be confusing, because it is new. But after webmasters begin to understand where, if and how to use it, they are more likely to love it.

JohnMu said in a forum post:

Here are some examples where this could be used: - Web-shops (mutliple URLs depending on how you got to a page) - Sites that work with Session-IDs within the URL - Ad-tracking URLs (eg using AdWords + Analytics) - Affiliate tracking URLs - News sites with multiple URLs per article - Forums with multiple URLs per thread/page (eg "&highlight=", etc)

Plus, Yoast already posted plugins to support this for Wordpress, Magento and Drupal.

Forum discussion Google Webmaster Help, Cre8asite Forums, WebmasterWorld and Sphinn.

posted rustybrick in Search Engine Optimization at February 13, 2009 9:25 AM Comments (6)

How Can You Block US Visitors But Not Block GoogleBot?

A Google Webmaster Help thread discusses a unique issue to a few niche sites that have no choice but to block all users from the United States while also wanting to allow GoogleBot to access the site.

Since GoogleBot lives in the US, blocking US based IP addresses, would likely also block GoogleBot. That would result in Google not indexing your site and you not ranking well in Google for the country you are targeting.

Googler, JohnMu, shares one acceptable way to block US based users but at the same time, allow GoogleBot to access your site. I assume this is not considered JavaScript cloaking, cause John said you can use it.

One potential solution would be to use a JavaScript-based interstitial that verifies the IP address and otherwise blocks access to your site. I assume you have to use JavaScript within your site, correct? If so, there would be no simple way for a user to selectively block the JavaScript interstitial and allow the JavaScript casino content. Assuming the JavaScript is in an external file that is disalllowed through your robots.txt file, Googlebot would not be able to view the interstitial and would be able to crawl the site normally.

This solution isn't necessarily new, but it is the first time I have seen a Googler suggest it in a forum.

Forum discussion at Google Webmaster Help.

posted rustybrick in Google Optimization at December 11, 2008 8:00 AM Comments (0)

Is Apple Cloaking Their iTunes Content, With Google Looking The Other Way?

Over the past few weeks, my brother and I have been working on a side project at RustyBrick on building out iPhone Apps. During this process, I took detailed notice to how iTunes works, how their API functions, and how Google indexes that content and it has raised some questions in my mind.

Let me step back and take you through this. We build an iPhone or iPod Touch application for the Jewish community. It is called Siddur, which is a Jewish prayer book. In short, it has Jewish prayers and tools to aid in those prayers. The community loves it, so I wanted to share the "reviews" that are on the iTunes Store with everyone, so we looked into using the API or XML from the iTunes store. As you can see on the iPhone Siddur, we added customer reviews pulled dynamically from the XML. How did I find the XML?

When we were looking at some Google search results, I discovered this result. If you click on the link, it actually will open up iTunes on your computer but if you click on the cache link, it shows you the content you would find in the iTunes application.

Screen Shot Search Result:
Google and iTunes

Screen Shot of iTunes App in Store:
Google and iTunes

Screen Shot of Google Cache:
Google and iTunes

So I did some forum research, to find an old WebmasterWorld thread. The thread talks about Apple's relationship with Google but then interestingly enough has a link. The link is http://google.com/itunes, which then links to http://services.google.com/marketing/links/itunes. Now, that is interesting, but I can speculate on it or it can be something that is 100% unbiased and not "evil." Update: It appears that the Google iTunes link in this paragraph no longer redirect to Google AdWords. They did last night and they did in 2005. Update 2: Matt Cutts of Google explained below that the google.com/itunes link was an old promotion for music labels. Basically, Music Labels received a promo to sign up with Google AdWords to promote their music. The promo is no longer valid, so Google dropped the link. So it seems totally unrelated to this story.

So why am I uncomfortable with this? Well, not everyone has iTunes on their computer. By listing the content of what is found in iTunes, in Google, as if it was a document accessible on the web... Well, that seems not so useful. Why not label these results as "iTunes" is required, or something like that? Why not let other developers build this into Google through Sitemaps?

Yes, I know Apple provides these iTunes hyperlinks so people can easily send the link to friends to download music, movies or apps - but again, these are not real "web documents." Or maybe, I am just being too picky?

Forum discussion at WebmasterWorld.

posted rustybrick in Cloaking / IP Delivery at August 22, 2008 9:13 AM Comments (8)

Updated: Google Says Blocking Countries Outside of the US is Against Policies

A Google Groups thread has a webmaster who has been receiving a lot of rogue spider attacks from the Africa region. He wants to go as far as ban the whole continent of Africa. But he is concerned that by doing so, he will also hurt his Google rankings.

It is actually not all that uncommon for network administrators to block specific regions of web traffic. In fact, I believe my office blocks the Asia and Africa regions from entering our network (not this site, but my office network). We pretty much banned that whole region, because we have no reason to allow those regions in (in most cases, but things have come up).

Would blocking the whole Africa hurt this guys search rankings in Google?

Googler, JohnMu, stepped in to say that by blocking an entire region, it would "be considered cloaking" and would be against Google's Webmaster Guidelines. Got that, if you block specific regions of traffic, like everyone outside of the US, that is cloaking and against Google's guidelines.

Do I agree with this policy? In many cases, no. If your site is local in nature and having visitors from outside a specific region doesn't make sense for your bandwidth bill, then it is up to the site owner to make that call. Of course, there may be users outside of a specific region that are your target audience, but in many cases people take the route of percentages and are willing to have some collateral damage.

John does give some excellent advice, advice that is not as easy as blocking a whole region via the Router but good advice in any event. He said, instead you should "add blocks based on the user's activity, not based on his location." Of course, then you need to build algorithms and software that detects certain activities and blocks them based on that activity. More tips on that type of detection here.

Forum discussion at Google Groups.

Update: Danny Sullivan and I talked to Google about this. Googles revised statement on this is summed up in a comment Danny, where Google said:

As long as the web server always blocks IPs from (say) Africa, it's not doing anything special/different for Googlebot, and so it wouldn't be considered cloaking, but geolocation instead.

Plus, Matt Cutts of Google gives a little more on what happened here:

Yup, what Danny said. The downside of doing a lot more talking to webmasters and site owners is that sometimes we'll misspeak, but I'd much rather have that problem and sometimes need to clarify than not be talking to webmasters as much. Barry, thanks for highlighting this, and JohnMu, thanks for always being willing to answer questions in the Google webmaster discussion group.

We have posted a new article on this retraction over here.

posted rustybrick in Google Optimization at July 2, 2008 7:40 AM Comments (18)

Google Ignites Cloaking & IP Delivery Debate

The Google Webmaster Central Blog posted a blog post named How Google defines IP delivery, geolocation, and cloaking. They cover topics such as making sure to treat Googlebot as a California resident with the Geolocation section, they go over IP delivery, cloaking content and and Google News First click free program.

I won't go through the blog post, you can read it at Google, I will however link to the related Google Groups thread and show you how this post started yet another cloaking and IP delivery debate.

You can also watch a video that summarized the blog post:

Here are select quotes:

Hopefully this is legit, but if not, how in general should browser specific HTML generation be handled? Should we pretend that robots are IE7 or Firefox, for example?
Re: cloaking: What about REST and the idea of different resource representations that is so prevalent throughout the web? Saying that one could run a MD5 hash on a resource url to determine a change in content violates the principles of REST, which states that resources can exist at the same url in different representations, the nature of the representation returned or received being determined by content negotiation?
Is what I'm doing really cloaking? I'm only trying to make it easier for Google to (efficiently) crawl the site which contains over 30 million posts. I'm not in any way trying to manipulate my rankings.

The cloaking debate never seems to end.

Forum discussion at Google Groups and WebmasterWorld.

posted rustybrick in Cloaking / IP Delivery at June 3, 2008 9:52 AM Comments (0)

Google Offers Advice on Automatic Redirection Based on Geolocation & Language

A Google Groups thread has discussion around how this webmaster should handle redirecting its web visitor to a localized version of their site, based on the IP address information.

Since I recently wrote a comprehensive article at Search Engine Land on How Search Engine Redirect Users To Country-Specific Sites, I thought this post was particularly interesting.

Googler, JohnMu, was first to respond to the webmaster's question. He explained that as a user, he wants the option to either see a localized version or a different version. John said, "I live in the German-speaking part of Switzerland but often browse the web in English. If I go to a search engine and search with English queries for English pages, I do not want to be redirected to a translated version which the website thinks I would like to see." For John, he would think that you should "allow the user to switch [to any version] with a simple link."

On the SEO front, John explains that if you offer these localized version via a different URL (unlike how Microsoft's Live Search works, interesting), then the search engine will pick up the content and then aid the different regions to go to your localized version for you. Let me quote John:

This allows our crawlers to find those pages and - should the user accidentally search for and click on the wrong one - lets the user move to a different language version on demand as well. By allowing our crawlers to crawl the various versions, we'll be better suited to suggest those URLs to users from those regions as well.

Of course there are many advantages to using IP geo-location techniques for your web visitor. But you need to judge the pros and cons of your specific site's goals and decide what works best.

Forum discussion at Google Groups.

posted rustybrick in Google Optimization at May 22, 2008 8:09 AM Comments (2)

How Google Views Four Ways Of Hiding Content

I would consider this Google Groups thread a "gem thread," since you don't always see a thread at this level - it is precious. A Googler, Wysz, discussed in detail four methods of hiding content and how Google may interpret each method.

JavaScript-Only Navigation: Wysz explains that this tactic does not fool or confuse search engines, so it likely won't hurt you in your rankings but from an "accessibility perspective," he says it is "not desirable." Wysz then goes on to answer specific questions posed by a webmaster, which you may gain some clues from.

CSS-Enhanced Navigation: Seems like Google and Wysz both love this method. Wysz said, as long as you do not have "intent to deceive search engines," then you should be fine. On the accessibility front, it is a win-win, "since it degrades gracefully as JavaScript and CSS support are removed," Wysz explains. Wysz adds, "Google should be able to follow these links and rank your pages normally."

Hidden Links via Positioning/Color, for Design/Accessibility: Wysz explains that this method can bring you "dangerously close to a grey area." The example he gives is that if you use the word "SkipToContent," which "isn't likely to be interpreted by anyone as an attempt at deception", said Wysz. He then adds, and this is important, "unless the term "SkipToContent" becomes a highly competitive keyword." Wysz does go on record saying, "If implemented in a non-deceptive manner, these aids should not cause a problem." But that leaves it up to Google to decide and intent is not always easy to judge. So, try not to use this method, if possible.

Hidden Links with No Mention of Accessibility or User Value: I think I will just quote Wysz here, cause he said it best. :)

I'm going to assume that these links are only intended for bots to see as attempt to deceive search engines. That's probably not an assumption you want a Googler to make. When making this judgement on your own, just ask yourself this question: "Is all of this text here for the user?" If you want to make Google (and your users) happy, the answer should always be "Yes."

Forum discussion at Google Groups.

posted rustybrick in Google Optimization at April 25, 2008 7:53 AM Comments (0)

I'm Deindexed... Now What?

A High Rankings Forum member has lost his high rankings (no pun intended). Apparently, he's been deindexed since January 2008. His first inclination at this point is to check the IP address, assuming everything else is looking good.

Is the IP really the right direction to go? As one member puts it, it can help, but moderator Randy has a different take.

As a general rule the search engines do not ban IP numbers. There are simply too many sites out there on shared hosting plans, so if they did they'd end up throwing out a lot of babies with the bathwater.

So what can you do instead? First, check if the search engine spiders are visiting at all. If they're not, you might be right about the IP blockage. But also investigate other possible reasons for being deindexed. Here's what you can do to check:

  1. Make sure your robots.txt file isn't blocking spiders.
  2. Browse your site as Googlebot and see what the spider sees.
  3. Investigate files and ensure they aren't hacked.
  4. Run Xenu on your website and check any external links
  5. Register your site with Google's Webmaster Tools to check for red flags

What else would you add to this list?

Forum discussion continues at High Rankings Forums.

posted Tamar Weinberg in Cloaking / IP Delivery at March 25, 2008 9:27 AM Comments (6)

Live Search's Spam Fake Referrals Were Cloaking Tests, Says Microsoft

I have been corresponding with Microsoft about the weird spam-like referrals Live Search was sending to hundreds, if not thousands, of web site log files throughout the web. On September 6th, a Microsoft representative confirmed that these were actual tests being conducted by the Live Search team - but did not expand upon that. Since these tests continued to linger on until even today, Microsoft has shared more details with us what exactly this test is about.

The answer is, Microsoft was testing for cloaking. A post by the Live Search team scheduled to go live at 3PM (EST) named Live Search and Cloaking Detection has all the details. In short, Microsoft explains that "one of these tools is an extension to MSNBot, giving us an additional way to detect cloaking."

But as I reported three times in the past by way of the WebmasterWorld thread, these tests were running havoc on log files, causing concern and questions as to where these referrals were coming from and why. So the answer is, it is a form of MSNBot used to detect cloaking.

Microsoft has now promised that this MSNBot will not impact your AdSense/Overture reporting, will not statistically impact your site statistics with unfilterable bot traffic, will not continue to "pollute" your HTTP logs with inappropriate terms (spam keywords), and Microsoft will respond to your questions in their forum or via this form.

I asked Microsoft a few questions about their announcement, here is the Q&A:

Q: How have you come up with a way "to optimize the crawler and most webmasters should notice the referrer traffic dropping to almost nothing over the next month." Will traffic still be inflated overall, but you won't pass a referral data?
A: Webmasters might still see some referral data being passed, but the keywords will be relevant to their sites, and it should not be statistically significant for any sized website. If webmasters are continuing to see issues, we recommend they contact us through our forums or feedback form.

Q:Why did Microsoft use spammy keywords as the referrer data in your initial tests?
A:We were using a common list of terms to test against all websites when we first launched this tool. We have now optimized the tool to use only keywords that are relevant to your website.

Q:Have you consulted with Google or Yahoo on how they handle the cloaking issue? If so, can you provide any details on that? Would you say you handle the cloaking issues the same way?
A:There are some commonalities in how all the major search engines detect cloaking, however, we can only comment on our own system.

Forum discussion continued at WebmasterWorld.

posted rustybrick in Microsoft MSN Search at December 4, 2007 3:00 PM Comments (0)

Customizing Landing Pages Based on User Intent Gleaned from Search Results

An interesting Cre8asite Forums thread talks about an old topic on determining user intent based on search results, and then serving up a customize page to that user, based on his or her intent.

G-Man asks, without "search engine data" but with scraped results, can you still deploy this tactic of customizing based on user intent?

Ammon Johns believes you can. Just like you would with your own search referral data, you can use the scraped results to determine user intent. Ammon explains, "It is perfectly possible to determine a lot about intent, and motive, from a user by the method they use, either in keywords, or even by other referral data. It is simply about context and empathy at heart."

Classifying the types of keyword phrases into buckets of user types. User types might include ready to buy, just browsing, looking for best price, looking for best customer service, and so on. When you determine your users, typically through user personification, you can then associate keyword types to each persona. Then you direct each landing page to each persona based on the keyword search.

The thread goes into this in more detail.

Forum discussion at Cre8asite Forums.

posted rustybrick in Search Engine Optimization at June 7, 2007 7:04 AM Comments (0)

Is Yahoo! Autos Cloaking?

A WebmasterWorld thread links to a post at Agerhart.com showing screen shots of Yahoo! Autos cloaking.

Cloaking is when a search bot is given one page of content, while a normal user is given another set of content.

If you go to http://autos.yahoo.com/used-cars/forsale.html and compare it with the Google Cache version, to me they look identical. So possibly, Yahoo! changed it. But in the screen captures, only the Google version had the "used cars" anchor text by every state break down. You can see the before and after at Agerhart.com.

It seems like Yahoo Autos is currently not cloaking at this moment.

Forum discussion at WebmasterWorld.

Update: Tim Mayer of Yahoo! has confirmed on May 22nd edition of The Daily Search Cast that Yahoo Autos has changed the page since this has been reported. So, Yahoo Autos was cloaking. FYI, this wasn't the first time Yahoo! was caught cloaking, there also were spotted using unethical search practices in July 2005.

posted rustybrick in Cloaking / IP Delivery at May 22, 2007 8:05 AM Comments (4)

How Important is the IP Address for SEO?

The question about IP address and SEO is revisited time and time again. How beneficial is having a shared IP versus a dedicated IP? A Search Engine Watch Forums thread tackles this question.

1. What happens when one web site gets banned and in doing so penalizing the IP?
2. Is it fair that a web site using ethical SEO techniques is affected because it is being hosted in a "bad" neighborhood?
3. Does interlinking websites on the the same shared IP cause search engines to view them as a cluster and consequently loose link weight? i.e. links originating from a single C Block.

A few observations have been thrown out. A number of comments have been mentioned to reflect that Google is not as concerned about the IP address and focuses more on the domain name. However, it is observed that Google would rather you not link to the same sites if they reside on the same C-class.

Another person quotes Google:

"Actually, Google handles virtually hosted domains and their links just the same as domains on unique IP addresses. If your ISP does virtual hosting correctly, you'll never see a difference between the two cases. We do see a small percentage of ISPs every month that misconfigure their virtual hosting, which might account for this persistent misperception--thanks for giving me the chance to dispel a myth!"

-Google Director of Technology Craig Silverstein Slashdot interview

Ian McAnerin, moderator, adds that a shared IP is fine if you do not do anything shady:

For the average site, my experience is that it really doesn't matter if it's shared or not. It only becomes an issue if you begin to push the spam envelope, at which point I would suggest you have other problems other than your IP...

I invite you to read a blog post from my BFF Lisa that was written two months ago that also addresses this question. In the post, Lisa says that it's best to be on a dedicated IP.

Forum discussion at Search Engine Watch Forums.

posted Tamar Weinberg in Search Engine Optimization at May 18, 2007 9:22 AM Comments (5)

Verify The Bots Accessing Your Site: Is Google.com Sending That GoogleBot?

There is no doubt that a ton of bot activity on one's sites are from rogue spiders. Spider or bots that pretend to be legit bots but are there to steal your content. We have covered several sessions on this in the past; here are some:

A new Cre8asite Forums thread asks a question on how does one verify if GoogleBot is really from Google.

Matt Cutts posted a detailed How to verify Googlebot back at the Webmaster Central Blog on 9/20/2006 explaining how to do reverse DNS and then a forward DNS->IP lookup.

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.

Of course there are some ways to automate this. Either code it yourself, buy CrawlWall or implement a solution similar to Ekstreme's PHP Search Engine Bot Authentication.

Rogue spiders are no fun, as we have seen in cases with some forums.

Forum discussion at Cre8asite Forums.

posted rustybrick in Google Search Engine at March 7, 2007 7:13 AM Comments (1)

Using CSS To Hide Text: Search Engine Responses

A WebmasterWorld sparked this post from me. At SES Chicago '06, during a session named CSS, AJAX, Web 2.0 & Search Engines the search engine representatives were asked about how they handle CSS.

It is currently easy to hide text using CSS, everyone knows it. But do people do it?

Back to the SES session, on this panel were search engine reps. Many of the search reps were new to conferences and were not necessarily prepared to get certain questions. It all started when a Yahoo representative told the crowd to open up your CSS so Yahoo can peak into it. Then Google said they will also be indexing JavaScript and AJAX and CSS, so don't use it to hack.

Now, if you know Yahoo! and specifically Google, they typically will never say that they will be doing anything in the future. They typically first do and then tell, but not tell and then do.

All the search engines, except for one, I believe (but I forgot if it was Ask.com or MSN) said that you should not block your CSS and JavaScript files from the search engines using your robots.txt, just in case they want to take a peak.

I am honestly still confused by that statement. Well, if we block it, will it raise a red flag? If it raises a red flag, will you manually peak? Are you going to algorithmically crawl those files and look for problems if we keep them accessible to you? If we format something a certain way, but it may appear like spam, but in reality it is not, will an automated ban come on the site?

Personally, I am not worried. But these types of responses, by the search engines, can fuel a lot of questions and unnecessary worries.

As pageoneresults says in the WebmasterWorld thread:

Google has a hard enough time now dealing with html/xhtml. Parsing CSS files and determining whether something is hidden or not is not a solution. Now the bot would need to determine why that CSS exists. There are many valid uses of display:none or display:hidden.

For those who may be hiding things through CSS or negatively positioning content off screen to manipulate page content, I surely wouldn't do that with any long term projects. ;)

The penalty for getting busted using this technique I would imagine is a permanent ban. No if's, and's, or but's, you're history. You'll need a pardon from the Governor to be reconsidered for inclusion. ;)

Forum discussion at WebmasterWorld.

posted rustybrick in Spam at December 18, 2006 7:42 AM Comments (7)

Cloaking in Google Should be Acceptable for Religious Purposes

Shimon Sandler wrote a post named Cloaking An Ecommerce Site where he asks, should you be allowed to cloak your e-commerce site for religious purposes. The situation is that some orthodox Jews shut down their web site on the Sabbath. So if you go to a site such as B&H from sundown Friday to darkness of Saturday night, you should get a sign that says they are closed. You will not be able to browse the site, you will not be able to look at product, I believe (but I am not sure) a one page message comes up telling the customer to come back after the Sabbath.

The issue here is that this may impact the site in the search results the rest of the week, and not just on Sabbath. Why? Because Google and other search engines may send a spider to the site and it will be inaccessible for about 25 hours, once per week.

So, Shimon asks, should Google allow you to cloak the site on that day only, by allowing spiders to access the main site, but not allow humans to access the site?

My argument, as I posted in Search Engine Roundtable Forums is;

Well, just like you are allowed special treatment for religious reasons, i.e. you can take standardized tests on sundays instead of saturdays in NY (I think), or you get kosher food on airlines, or you don't have to swear on a bible in court.... Why not special treatment for Google for Religious Reasons?

So should Google allow cloaking for religious reasons?

Forum discussion at Search Engine Roundtable Forums.

posted rustybrick in Cloaking / IP Delivery at November 8, 2006 11:56 AM Comments (3)

Scrape Bots Vs. Search Bots :: Fighting the Battle

A Search Engine Watch Forums thread asks how can one prevent scraping of his site's content by a non-authorized spider, while not hurting his rankings in search engines?

This is a serious issue, serious enough that there was a session about this named The Bot Obedience Course at SES San Jose 2006. In that session, Bill Atchison from CrawlWall.com gave an excellent presentation.

Robert Charlton at the thread notes that Bill will be releasing a software tool that helps do just that. He said there is a "Beta version coming soon." The crawlwall.com/technology.html page has details of the technology developed by CrawlWall.com.

CrawlWall uses the following technology to secure your website and protect your content. All of the various methods are designed to work together in harmony to make sure that all of the spiders with permission and legitimate visitors get into your website without issue and all of the rogue crawlers get stopped and never gain admission.

Tactics such as dynamic robots.txt files, whitelist opt-in permissions, "second pass filters," ip banning or/and address banning, proxy blocking, creating certain obstacles, and a quarantine list for those uncertain IPs.

I am looking forward to seeing how it works in the real world.

Forum discussion at Search Engine Watch Forums.

posted rustybrick in Cloaking / IP Delivery at September 12, 2006 7:06 AM Comments (1)

Brian White, Newest Google Representative To Hit Forums

Brian White, part of Matt Cutts webspam team, has joined Search Engine Watch Forums to continue the outstanding Google to Webmaster communication we have seen recently. Brian adds some more detail to our post yesterday on Web Hosts Found Cloaking Webmaster Content.

Brian explains what exactly is being done by this fraud:

We've discovered that the likely explanation is that a third party gained access to a number of sites and dropped files in these accounts (including a modified .htaccess using rewrite rules) for the purpose of rewriting the home page through a proxy script. The proxy script adds links when Googlebot visits, and in a sinister twist, adds the rel=nofollow link to cap off PageRank bound for any external URL not under control of this third party. As Danny noted, they also add a NOARCHIVE meta tag to disable the cached version in results.

He then clarifies that Google has made sure to block any PR boost or ranking boost this person is trying to achieve.

Finally, Brian explains additional methods for you to see if this is a problem on your site.

At the risk of allowing the folks who created this to adapt, you can use Google Translate to confirm the behavior. Check any of the affected sites (no Cached link) on the Google search ["hairy sex porn free"] via Translate to see the cloaking, since the proxy script checks for a visit from Googlebot IP addresses, and doesn't discern between a regular crawl visit and a Translate request.

Continued forum discussion at Search Engine Watch Forums and welcome Brian!

posted rustybrick in SEO Forum News at August 15, 2006 7:11 AM Comments (0)

Someone Stealing Your Content? Play With Them

A Cre8asite Forums thread has a classic discussion taking place. The discussion is what can I do about someone who steals my content? We have discussed it here time and time again. But this thread has an example of a funny but yet serious example of what can happen if you mess with the wrong site.

Ha.ckers.org noticed he and some friends were being messed with and he didn't like it, so he deployed a little trick.

So anyway, it was fairly trivial to figure out who was ripping my RSS feed. So it took me a few seconds to modify my document management system to do some IP delivery to the moron, and a few seconds of searching on the web for some nice prescription drug spam and poof!

So, if you are going to mess with someone, don't pick a site or blog of a hacker.

Forum discussion at Cre8asite Forums.

posted rustybrick in SEO Copywriting at July 14, 2006 7:25 AM Comments (0)

Image Hotlink Protection & Image Search Engines Like Google Images

A WebmasterWorld thread asks if there are any issues with using hotlink protection for your images and the same images suffering in image search. Hotlink protection, if you do not know what it is, is when you want to dissuade others from pulling your images directly from your server. You can use hotlink protection, such as with htaccess, to either block or serve up a different image, to those pulling the images from you. But does this affect your search rankings in image search engines like Google image search?

Most of the folks in the forum discussion say there is no issue with Google and hotlink protection. Some recommend that you allow certain domains to display the images properly, such as your own domain (duh) and the shopping search engines (if that applies), news engines (if that applies), blog engines, image search engines and so on. But that list can get long.

Forum discussion at WebmasterWorld.

posted rustybrick in Search Engine Optimization at July 6, 2006 8:23 AM Comments (0)

NY Times is Cloaking But Not Spamming; Danny Sullivan Says

Friday, I asked New York Times Allowed to Cloak Content? Where I explained that I felt the NYTimes was indeed cloaking content, based on Matt Cutts interpretation and that they are receiving special treatment from Googlebot. Danny Sullivan posted his thoughts in the forum thread, stating clearly;

Do I think the NYT is spamming Google? No. Do I think they are cloaking? Yes. Do I think they should be banned because Google itself warns against cloaking? No.

Yes, Danny believes they are cloaking. But no, Danny, as do many, feel that Google should not ban NYTimes.com or others like them.

Of course, there are others that do not feel that this is a typical situation of cloaking. And cloaking can be defined differently. But I prefer to use Google's definition of cloaking, or at least Matt Cutts definition.

Strong question:

IF IT'S SUBSCRIPTION INFORMATION, I NEED TO KNOW IT BEFORE I CLICK!!!

That would be nice, I try to always let my readers know when I link out to a subscription required link. Some news search engines do that also, but adding "registration req." or "subscription" in small text. If Google is allowing this, then at least give us that detail. And at least enable all publications to do the same. And clarify your policy on such "cloaking" practices.

Continued forum discussion at Search Engine Watch Forums.

posted rustybrick in Cloaking / IP Delivery at June 19, 2006 7:20 AM Comments (0)

New York Times Allowed to Cloak Content?

A SearchDay article by Danny and Chris over at Search Engine Watch named Getting The New York Times More Search Engine Friendly talks about how Marshall Simmonds (first with About.com and then acquired by NY Times) made the NYTimes.com search engine friendly. Part of that process is to allow the search engines, including Google, to access, crawl, index and rank content that would require a username and password by a normal Web user.

Danny and Chris ask the question and answer it; "Isn't this cloaking—serving different pages to a search engine and an individual web browser? Yes, it is." Yes, there is a BUT;

Although both Google and Yahoo warn against cloaking, Marshall says both companies are aware of what the Times is doing, and apparently condone the practice.

"They want the content, and they're very interested in displaying it," says Marshall.

Reviewing the latest from Google on cloaking you see that Matt Cutts makes a clear distinction;

So IP delivery is fine, but don't do anything special for Googlebot. Just treat it like a typical user visiting the site.

NYTimes.com is clearly doing something "special for Googlebot" here and in terms of how Matt Cutts defines "acceptable cloaking," this does not fall within those terms. At other engines like Yahoo!, Ask and MSN, engines that have not taken as strong a stance on cloaking, this most likely would be acceptable. But at Google, I believe, based on Matt Cutts continued campaign against cloaking, this would not fall within Google's webmaster guidelines.

Forum discussion at Search Engine Watch Forums.

posted rustybrick in Cloaking / IP Delivery at June 16, 2006 8:25 AM Comments (0)

Google On Cloaking & IP Delivery

Matt Cutts and Google have always had a strong stance of any form of cloaking, for as long as I have been covering the search industry. While I was away, Matt Cutts posted a comment at his blog where he provided the "short answer from Google’s perspective."

IP delivery: delivering results to users based on IP address. Cloaking: showing different pages to users than to search engines.

IP delivery includes things like "users from Britain get sent to the co.uk, users from France get sent to the .fr". This is fine-even Google does this.

It's when you do something *special* or out-of-the-ordinary for Googlebot that you start to get in trouble, because that's cloaking. In the example above, cloaking would be "if a user is from Googlelandia, they get sent to our Google-only optimized text pages."

So IP delivery is fine, but don't do anything special for Googlebot. Just treat it like a typical user visiting the site.

It all comes down to intent, as Cre8asite Forum Site Admin says in the forum thread named Final word on cloaking?

posted rustybrick in Cloaking / IP Delivery at April 21, 2006 2:06 PM Comments (1)

Current Issues With Cloaking Using IP Delivery Technology

Rand Fishkin started an excellent thread at Cre8asite forums named Cloaking Beyond IP Delivery, Discussing Other Methods to Limit Access. In his thread, he asks people for alternatives to the cloaking through IP delivery methods. He states that several black-hat SEOs have said cloaking through IP delivery "has become passe." Rand asked for alternatives. Instead of focusing on the alternatives, which can lead a site to be kicked out of the index if implemented wrong, let's focus on the history of IP based delivery to perform what is known as cloaking.

For that we must read Ammon Johns post which goes through some of the history of IP delivery and its faults. Ammon explains that based on how IP delivery works, if you are missing an IP address of a spider, then the spider will be served your standard page and not the page you would like to serve up to the spider. So Ammon then moves on to what he calls "decloaking hazards" that have shown up over the past several years. Here is a list, provided by Ammon, of some of the "decloaking hazards:"

  • Translation services, such as Alta Vista's Babel Fish Translation that showed translation of the page the spider was served and not what the end user was served.
  • The Cache feature at the search engines, when you click on the cached page, it would show the page the spider was served and not what the end user was served. Of course you can tell the search engine not to cache the page, but back in the day, it was a great way for search engines to find sites that were likely to cloak - it raised a red flag. Ammon explained that cloakers had to use hidden text on the cloaked pages, to serve up a page that looked similar in the cache to those that the end users saw.
  • Toolbars and Desktop search came out, giving search engines yet an other method "to 'sample' exactly what the user is getting if it is even one bit different to what the engine has recorded."
  • Finally, search engines can and probably do send spiders through proxies and have spiders that act more human-like, making them extremely hard to detect and cloak properly.

So before deploying a form of IP Delivery, discuss it with professionals and also make sure to check out the Cre8asite Forum Thread.

posted rustybrick in Cloaking / IP Delivery at March 15, 2006 7:47 AM Comments (0)

JavaScript Gateway Protection Pages

Wednesday night, during the "Exhibit Hall - Cocktail Reception" I spoted Matt Cutts a few yards away from the Google booth. On my way over to say hi, which I really didn't do the whole time being at the conference, some Scandinavian folks came over to ask him a question.

They explained that they run a Vodka site (not sure on the brand) and said they needed to pre-qualify that anyone who enters the site has to say they are 18 years of age or older. They asked, if they would be allowed to add a popup via JavaScript that sits above the page content, and only goes away, if they answer the pre-qualifying question. Matt said its a tough question, but in that case, he would feel comfortable with it.

Matt explained that the page was not "cloaked" because it was not showing different content to the search engine and the end user. And that it would be an acceptable use of this strategy, to enable the bots to spider the site and pre-qualify users before seeing the content.

Of course, I chimed in, I doubt the Scandinavians knew who I was anyway. I said, Matt - I am shocked that you would say that. To use a JavaScript popup, to hide content for some end users and show it to others. That is just mind-blowing that a person of your reputation would say that it would be acceptable. Matt started to explain why it would be acceptable, and then the Scandinavians also started to explain to me why it would be fine. I quickly said I was just giving Matt a hard time and I agree with them.

Matt explained to me later, after getting permission to post this entry, that he may not be ok with using a CSS layer above the content. He explained; the he "was only referring to JavaScript that would do a pop-up, not layers." He further explains, "But if both a search engine and the user get the same content, and the content includes JavaScript that a user must affirmatively answer yes/no, that wouldn't violate our guidelines."

My only question to Matt now is the following. Matt, what about those new young GoogleBots? Are they allowed to see content meant for the eyes of 18 year olds or older? I mean, how does the dog years work for GoogleBots? :)

posted rustybrick in Google Optimization at November 18, 2005 8:01 AM Comments (2)

Cloaking is Not the Magic Bullet

At least that is what the cloaking genius, Fantomaster, says in a Search Engine Watch thread renamed to A Little Help Needed Regarding Cloaking.

The thread starts off with an SEO saying that his client is set on signing up with Fantomaster's products or services. However, the client feels that by deploying cloaking, you will rise to the top without any work or effort. That is not the case! Even Fantomaster himself said, in the thread;

So if that client of yours is so desperate indeed and truly believes that cloaking will solve all his problems, I'd positively advise him against going for it because he's obviously quite clueless regarding the SEO game in general. For as Mikkel and Ammon and so many others have already pointed out: it's about business - and it's about the correct use of tools.

The bottom-line is that when using cloaking it is important to understand why. You need to understand your short-term and long-term goals. A professional and experienced SEO will then be able to formulate an effective SEO strategy for you. It may be a mix breed between white-hat SEO and black-hat SEO, or it may be one or the other.

More at the forum.

posted rustybrick in Cloaking / IP Delivery at October 14, 2005 8:41 AM Comments (0)

Yahoo! Caught Cloaking

Mikkel deMib Svendsen, SEW Mod - SES Speaker - SEM Industry Leader, created a thread yesterday named Is Yahoo cloaking Yahoo?

Mikkel points to this cached page of www.maddhattentertainment.com/jbm/index.pl/schiavo-garden-blog-blog.html. Mikkel asks;

What is the relationship between the site above, JustBlogMe.com and Blo.gs - they seem it all be part of some kind of interlinking,redirecting scam project. I haven't spend much time on it yet, but I just wanted to throw it out to all of you. It just dosn't look "clean"

blo.gs say on the website that it has been purchased by Yahoo. maddhattentertainment.com on the other side dosn't say anything about Yahoo (and it does look abit on the "adult" side ...). But if you go to for example: maddhattentertainment.com/jbm/index.pl/schiavo-garden-blog-blog.html (found in the Yahoo index) you are redirected to justblogme.com - allthought the title in Yahoo says blo.gs

SEOmike replies that a company like Yahoo! should know how to cloak without getting easily caught. He offers, "Yahoo next time please give me a call and consult with me before your next cloak, I'll be glad to make it completely undetectable."

Mikkel further explains that its not the point that they are doing some form of cloaking but rather that;

My only point is just that is amazing how little many of the people INSIDE the engines that are responsible for publishing content knows about their own webmaster guidelines and it makes it very dificult for the rest of us to even consider taking those guidelines serious when they prove again and again that they don't really care - or, are able to live up to it even within their own crew.

It is so true, especially with a company the size of Yahoo! They have many divisions, one division is the search division. They don't always communicate and rarely understand the details of each division. So it is Tim Mayer's responsibility to get to the bottom of this, which he will - "We will look into it and make sure the issue is resolved to abide by our content guidelines," Tim says.

posted rustybrick in Cloaking / IP Delivery at July 21, 2005 8:47 AM Comments (1)

The Future of Cloaking & SEO

A member at Search Engine Watch forums started a new thread named Cloaking, load of BS?, where he asks, "Does anyone come across cloaked sites in the SERPs anymore, 'cos I don't seem to." The question is, is there a future for cloaking in the search engine optimization world?

What makes this thread fun is that we have fantomaster, the king of cloaking, chime in and then GoogleGuy, who absolutely hates cloaking, discuss the concept as well. So far, everything is very professionally discussed.

Fantomaster says, "demand [for advanced cloaking products] has been on a steep rise and if current figures are anything to go by, we for our part expect to double our cloaking products and services related revenue this year."

Fantomaster explains that the reason the original thread creator, glengara, doesn't see the cloaked results is because, as rcjordan said, "the big dogs -good & bad- are using high-grade cloaking as much or more than ever." Basically, the cloaking tactics practices by the top experts are extremely hard to recognize by the end user. Fantomaster continues by saying; "sure, you may have a hunch perhaps, but it will never be more than guesswork."

Mikkel, who has been around forever, like fantomaster said:

Cloaking is definitely still around - I know for a fact. The main reasons my clients chose cloaking is because they are either too lazy or restricted to change their own site and work, because they want to target keyword variations or misspellings they are not allowed to have on the website, or because the entire site is graphic or multimedia based.

Back on topic of if cloaking is a growing or declining business. GoogleGuy posts that it is declining based on his observations. He said; "In my experience there's fewer cloaked domains in SERPs these days. It's not completely gone though; there are still some sites that cloak, especially when the owner views the domain as disposable."

It is possible that there are less novice cloakers out there, so both fantomaster (Mikkel as well) and GoogleGuy can both be right. There has never been a published paper on the number of cloaked sites in the search engines as of today, compared to 5 years ago. I am sure the number is higher, but on percentages of number of sites on the Internet compared to cloaked sites, it might be lower. Again, we simply do not know. Maybe Gary Price can locate a paper on this topic, with actual figures and data? :)

posted rustybrick in Cloaking / IP Delivery at May 20, 2005 11:08 AM Comments (0)

IP Detection for Currency Content Delivery

As many of you know, there is no fine line when it comes to what is acceptable IP Delivery and what is not. Google clearly states in its Quality Guidelines, "Don't employ cloaking or sneaky redirects." But Google, "cloaks" or uses IP delivery methods themselves. I have a whole category at this site devoted to Cloaking & IP Delivery, and you will find that even Google Banned Itself for Cloaking. But is there acceptable cloaking?

Of course. One such case of acceptable cloaking is found in a brand new, and really quick thread at WebmasterWorld named Display Currency Prices Depending On IP Address. The member simply asked,

A client of mine wants me to set up his site, so that if the ip address comes from America, then all prices should be in US dollars. From the UK, pounds, and Europe etc ... Is this going to cause a problem with google?

And Brett Tabke simply answered. "No."

posted rustybrick in Cloaking / IP Delivery at May 12, 2005 10:14 AM Comments (1)

fantomNews and Fantomaster Site Update

Legendary Fantomaster has finally updated the look and feel for his fantomas site. He is the leading IP Delivery, cloaking, content generation, and "spam" suite toolkit on the Web. He has been around for longer then most people and has really made a name for himself.

fantomas.gif

Plus, Ralph (aka fantomaster) has updated the fantomNews section of the site. To have some fun, he will be releasing a weekly cartoon he called fantOon series. It is updated weekly on Mondays and I bet they might tick off a few search engine reps, from time to time. In addition, there is an RSS feed selection for you RSS feed nuts (like me).

posted rustybrick in Cloaking / IP Delivery at April 7, 2005 8:38 AM Comments (0)

Google Bans Itself for Cloaking

Yesterday Ben wrote an entry named Google Cloaking Its Own Pages! where he discribes the forums discussing the slip up on Google's part. The thread at WebmasterWorld on the second page, message # 27 by GoogleGuy, we hear the official Google response.

Those pages were primarily intended for the Google Search Appliances that do site search on individual help center pages. For example, http://adwords.google.com/support has a search box, and that search is powered by a Google Search Appliance. In order to help the Google Search Appliance find answers to questions, the user support system checked for the user agent of "Googlebot" (the Google Search Appliance uses "Googlebot" as a user agent), and if it found it, it added additional information from the user support database into the title.

The issue is that in addition to being accessed via the internal site-search at each help center, these pages can be accessed by static links via the web. When the web-crawl Googlebot visits, the user support system thinks that it's the Google Search Appliance (the code only checks for "Googlebot") and adds these additional keywords.

That's the background, so let me talk about what we're doing. To be consistent with our guidelines, we're removing these pages from our index. I think the pages are already gone from most of our data centers--a search like [site:google.com/support] didn't return any of these pages when I checked. Once the pages are fully changed, people will have to follow the same procedure that anyone else would (email webmaster at google.com with the subject "Reinclusion request" to explain the situation).

Many other forums and blogs are discussing this. Mikkel at Search Engine Watch Forums asks some tough questions. ThreadWatch.org has their normal humerous look at the topic. And even the folks over at SEO Chat are discussing it.

posted rustybrick in Cloaking / IP Delivery at March 9, 2005 8:25 AM Comments (0)

Hosting Companies Take Advantage of Clients

There are forms of cloaking that some would consider acceptable practices and then there are some forms of cloaking that everyone would consider unacceptable. One such case of cloaking was presented by Phil Craven over at the Search Engine Watch Forums under the thread title Obnoxious cloaking scam. In that thread, Phil discusses a case where he found a client being taken advantage of by his hosting company. Basically, the hosting company was basically serving up a cloaked page to the search engine bots on arrival. In these cloaked pages were added text links to benefit the hosting company, without the consent of the client. In addition, new subdirectories are added to the unsuspecting client's site.

GoogleGuy saw the thread, as PhilC and others wanted to and offered to help. Its important for this type of scam to get out there in the public, so please tell people about this thread and type of cloaking.

posted rustybrick in Spam at March 6, 2005 9:53 AM Comments (0)

Cloaking for the Future

The non-spoke about 'art' of cloaking is being opening discusses over at the Search Engine Watch forums. In fact, one of the most recognized leaders in the cloaking industry, fantomaster, has agreed to answer questions on the topic of cloaking. The threads name is Cloaking 101 and I have taken this opportunity to get some of my questions answered. Let me take you through a summary of the highlighted points in the thread, many are basic, hence the name of the thread.

  • Very hard to get quality links to cloaked pages with Shadow Domains, because these pages are invisible to the Web user.
  • The first thing that caught my eye was that fantomaster said that Google uses the Google Toolbar to find new pages, the link goes to an enter I wrote just on that topic.
  • When using Shadow Domains, it is smart to have the name related to the product, in order not to annoy the redirected Web visitor.
  • There is a big difference between IP delivery and User Agent redirection.
  • Reasons to deploy cloaking is mostly to retain 'control'; layout, rankings, dynamic page issues, flash problems, multimedia, session ids and so on.

Moving outside the basics, the thread moved onto to a topic I discusses here a couple times, most recently over here. Read this before reading the rest of this entry. So you see, that search engines (IMO) will be using a form of image retrieval to see what content is located in which 'blocks' on a page. How does this relate to cloaking? Well, if you read the thread, I discuss how dynamic content delivery can be utilized in conjunction with dynamic IP delivery. So if a search engine is visiting a page, the text ad links will be found within the blocks that are worth something as opposed to the blocks that are worth nothing or next to nothing.

But as you can see by one of my posts, I am stuck trying to determine if redirection or dynamic content delivery is the right way to go for this.

Disclaimer: I have never deployed cloaking in any form, I do not personally know the risks involved. Search engines clearly frown upon any form of cloaking. This entry was inspired by the Search Engine Watch thread named Cloaking 101 - Questions and Answers.

posted rustybrick in Cloaking / IP Delivery at October 17, 2004 12:47 PM Comments (0)

Stealth Browsing with FireFox

NickW over at Search Engine Watch has an excellent post on how to make your Web browsing invisible, he named the post 007 Stealth Browsing with FireFox/Moz. He explains there are many reasons why one would want to browse the Web in invisible mode, including; SERPS check paranoia, Inventive marketing (which he links to a post that discusses spam marketing for the most part), Competition scoping, or just plain paranoia. I am just going to copy and paste the how to portion of his post here, for more information, visit the thread.

1) Grab FireBird or Mozilla 2) Get an account with a decent proxy provider (search for it and click the adwords ads - the free ones are monkey #£@!,) More on proxies in the kung fu link above. 3) Set up a profile (tools - profiles - manage profiles and call it kung fu or whatever you want. Switch to that profile. 4) Download the exellent Switch Proxy and User Agent Switcher extensions and install them. With the proxy one, i suggest that you use the text file of proxies option, it allows you to specify unlimited amounts of proxies and switch proxy every X seconds. Even better if they come form different countries... 5) Then type this in the browser address bar 'about:config' - no quotes - thanks to NFFC for that... 6) Find the line 'network.http.sendRefererHeader' and set it to 0 to stop people seeing where you've come from - you might also want to set the 'network.http.sendSecureXSiteReferer' to false. I've no idea what that does but it looks naughty to me.. 7) Then go to Edit - Prefs - Privacy and disable JS and Cookies 8) Go to Edit - Prefs - Advanced and disable Java - if you have java enabled your proxies are useless. 9) Once you've done all that, you'll need to configure the switch proxy and user agent switcher extensions, they're dead easy, just read the help text...
stealth-browsing.gif

posted rustybrick in Cloaking / IP Delivery at October 10, 2004 12:43 PM Comments (0)

Fix Doorway Pages on Your Client's Site To Do No Harm

So you have just taken over a new clients site, you have got all the details situated, and you know they have doorway pages infecting some area of the site. First step, is to get rid of those bad boys. Step 2 is to make sure the search engine knows there are no more pages in that location. So, what do you do to best remedy the situation, without doing your client or website any harm in the search engines?

Additional Important Questions:
Do I use a 301 redirect or a 404 page not found in place of these pages?
Do I need to conserve the pagerank of the original page by passing it on?
What happens if I use a 404?
What if the search engine have already spidered these pages?

There is a good thread over at Cre8asite Forums that details this very case. The member Mike521, dealt with a situation just like this, he got rid of the doorway pages generated from a previous company, and recommend a 404 page not found to be put in place. Basically removing the page completely. Ammon Johns goes on to say that "Search engines should always drop any correctly formed 404 error URL. The only times a search engine retains a 404 page is when the custom 404 page has been poorly done". Additionally if there is no real value on the doorway pages in terms of pagerank it would seem pointless in doing a 301 or 302 redirect to another page.

Check out the thread on Fixing Doorway Pages

posted Phoenix in Spam at October 7, 2004 7:22 PM Comments (0)

Microsoft Deploys Novice Cloaking - Doorway Pages

In a post by NFFC, old admin at WMW, over at Search Engine Watch Forums, he reveals that Microsoft is deploying these easy to detect cloaked or doorway pages. Want to see?.

This news seems to be taking off. A thread at Cre8asite Forums, started by Webby, named Microsoft using doorway pages delves deeper into this topic. As noted in the thread, if you turn off your JavaScript function in your browser and visit this page, you will see a page that reads "Welcome to our company. This page has been designed to help our visitors finding directly the information, product or service they are searching in our websites." First thing, this sentence doesn't even read well and secondly this is a search engine "no-no". If you want to see a screen image of the page, click here.

Make sure to check out the thread.

posted rustybrick in Cloaking / IP Delivery at September 20, 2004 9:27 AM Comments (0)

Legendary Fantomaster Makes Appearance at SEW Forums

A thread I started over at SEW forums named How Do I Spot Cloaked Sites? began to take on a civilized and professional discussion on detecting cloaking. The thread did not, and should not, go into the realm of ethics, but rather it will remain on the 'how to".

Half way into the thread, there was need for someone who would be considered an expert in cloaking. Danny Sullivan was able to encourage fantomaster (Ralph Tegtmeier), one of the most well known experts in the cloaking field, to participate in the thread. And at post #43, fantomaster joins the thread. He then goes on to post three additional times, post #56, post #68, and post #71.

posted rustybrick in Cloaking / IP Delivery at September 3, 2004 8:52 AM Comments (0)

How Do I Spot Cloaked Sites?

Forget the debate about cloaking, I am a bit tired of that anyway. How does one detect some of the cloaking going on around the Web. Follow these instructions:

(1) Download the Firefox Browser
(2) Install it
(3) Download the User Agent Switcher for Firefox/Mozilla while using firefox
(4) Restart the browser
(5) Under Tools --> User Agent Switcher --> Options --> Options (that will open a dialog box)
(6) Click Add Under User Agents section
(7) In the description add "Googlebot" and in the user agent add "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
(8) Repeat this process for all the spiders you want to test. Updated comprehensive list of user agents.
(9) Under Tools --> User Agent Switcher --> select the user agent
(10) Then navigate to the pages that you want to test for cloaking.

Hope this helps some people be Googlebot. :)

posted rustybrick in Cloaking / IP Delivery at August 31, 2004 10:59 AM Comments (0)

What Color Hat Do Your Wear as an SEO?

Nothing like a forum thread on white hats versus black hats to stir up hot debate. Instead, why don't you just joke around about it and set up a poll. That is what one member did at Search Engine Watch. So far 57% answered the hat color question as "Who cares? It's all just posturing anyways...." What about you?

white-black-hat-seo.jpg

posted rustybrick in Search Engine Optimization at August 16, 2004 5:36 PM Comments (0)

Request for Acceptable Cloaking Usage Policy

Cloaking is out there and is practiced by thousands of Web sites out there. Where does Google and the other engines draw the line between acceptable cloaking or non acceptable cloaking?

A few weeks ago, Ben Edelman released information on WhenU and how they are using cloaking to beat the engines. Soon after, Google and Yahoo manually did something about it.

Today, Danny Sullivan reports on NPR is using cloaking. Will Google and Yahoo do something about this case? How does it differ?

Well NPR is using cloaking to provide contextual information to the search engines on audio files. The audio files, that otherwise would not be indexable by the search engines, are transformed into text transcripts and served to Google only. Google reads the text version of the audio files and when someone does a search on a related topic to the audio file, NPR comes up in the results. The results look like the following, notice "And with us now to discuss Google's financial standing is ..."


npr-results-small.gif
View Large Image

But when you click on the result, it takes you to a page with the ability to download the audio file and contains no such text version of the transcript.

Danny gets into the pros and cons of this method of cloaking. I won't tell you exactly what he said, but if you are a paid subscriber to SearchEngineWatch, it makes for a nice read.

Andy Beal also spoke on this matter, "He [Danny Sullivan] comes to the conclusion that NPR is effectively using the spam technique, cloaking. But, I [Andy Beal] would argue that perhaps NPR converting its audio into text is no different that including ALT tags on images or tagging Flash content."

Either way, we need the search engines to come up with a clear acceptable cloaking policy. There is no doubt that cloaking can benefit the end user, the legendary SEO named fantomaster was a huge advocate for the use of cloaking to benefit the searcher. This is my call out to the search engines to make a stand and come out with a clear, defined policy. This does not have to be a war between the Search Engine Marketer and the Search Engine Provider, we can work together. Can't we?

posted rustybrick in Cloaking / IP Delivery at May 28, 2004 11:16 AM Comments (0)

Cloaked Sites Ranking Well After Leaving Sandbox?

The problem I have with some forums is that they don't give specific examples, but one such forum has enough of a member base to take "hear say" as almost factual. I personally hate reporting on threads that I can not verify or do not verify with my own tests but here is one of those times that I will.

A thread named How can cloaked sites be ranking well at Google? over at WebmasterWorld discusses how recently people have been seeing cloaked sites 'polluting' the Google results.

The thread begins as follows:

What I don't understand is, that if their algorithm is so brilliant how come cloaked sites (the pages which are fed to the crawlers) have poor inbound links, low quality content, almost non-existent internal linking structure and yet they rank at the top? In my opinion, the pages that the cloaks feed to crawlers shouldn't rank highly even if they WERE the actual pages users were seeing!

To me it sounds like either these sites are ranking for non-competitive keywords, or they have inbound links with rich keyword anchor text from other cloaked backs, or Google doesn't care about anchor text. Which one sounds best to you?

posted rustybrick in Cloaking / IP Delivery at May 23, 2004 7:36 PM Comments (0)

WhenU's SEO Firm Was Synergy 6

Ben Edelman, the individual who brought the whole WhenU's case to light, has posted a very interesting comment at this site. He said:

A few people have asked me which SEO WhenU used. After all, it would seem to be perfectly natural for WhenU to name the SEO, and to let the SEO confirm WhenU's statement of what happened here. But all the news coverage to date is silent as to which SEO did the work -- even news publications that directly interviewed WhenU's Avi Naider on this subject.

So, this seemed like a subject ripe for some technical examination. I've taken a look, examing IP sharing and HTTP responses. All signs point to Synergy6. See my new addition to the site:

Which SEO Did WhenU Use? The Best Inference: Synergy6
http://www.benedelman.org/spyware/whenu-spam/seo.html

Ben Edelman
benedelman.org

Thank you Ben for sharing this information.

posted rustybrick in Cloaking / IP Delivery at May 16, 2004 4:10 PM Comments (0)

Google & Yahoo Publically Do Something About Cloaking

The news is all over the forums and the Web. This is really the first public and manual manipulation of the search engine results made by Google and Yahoo for "cloaking." (updated: used wrong language here, sorry been a crazy day - please see Danny Sullivan's comment)

Yahoo and Google have disabled links to controversial adware maker WhenU after the company was accused of engaging in unauthorized practices aimed at boosting its search rankings, WhenU's top executive confirmed Thursday.

The blocking of this Adware company, WhenU, is a major step in the search engine industry and I look forward to see what will transpire.

Forum Coverage (sorry if I missed any):

posted rustybrick in Cloaking / IP Delivery at May 14, 2004 2:08 PM Comments (1)

Are you White or Black: Gray Hat SEO

Ammon Johns said here:122763.jpg

There are no hats.

There are three forms of SEO tactics:
1. Techniques currently rated as 'safe'.
2. Techniques rated as risky.
3. Risk balanced and risk managed techniques.

This black hat versus white hat SEO thread at Cre8asite Forums is not the average thread. Risk management is discussed, by bringing to light the pros and cons of each tactic. Check out the thread.

posted rustybrick in Search Engine Optimization at April 16, 2004 8:59 AM Comments (1)

Fantomaster - Ralph Tegtmeier

One of the first big advocates for cloaking and other forms of IP based delivery methods is due an entry at this blog. I have never spoken or contacted Ralph Tegtmeier (a.k.a. Fantomaster) but I have heard a lot about him.

I wanted to just highlight a few captions from Peter Da Vanzo's Search Engine Blog Interview with Ralph Tegtmeier. I will answer personal questions I have received with Fantomaster's Interview. Let me stress, these are NOT the questions asked to Ralph Tegtmeier by Peter Da Vanzo. Click on the link directly above if you want that.

Why did I not attend the "Meet the Crawlers" Session at the last Search Engine Strategies Conference? You can see it ever and again at search engine conferences all those search engine reps admonishing the participants not to do this, not to do that, to stick to this, to only do that, "be a good boy/girl across the board, do what daddy tells you, and we just might be a wee bit nice to you", wagging their index fingers and threatening SEMs with dirty looks - just like a bloomin' nursery! (laughs) And here are all these adult people, webmasters and marketing officers alike, gobbling it all up in awe like gospel. More often than not, it's a pretty pathetic spectacle.

I have done everything I could to make my site search engine friendly. I rank well for specific keywords and I am happy. However, I want to take my site to the next level but I don't want to cross any lines - how can you help me? There's a pervading myth in the search engine marketing and optimization industry that if you're a good boy, the engines will pat your head and will reward you with fine rankings, even if it may take an incarnation or two. That's unfortunate because not only does it fuzz up the hardcore technological issues involved, it also attracts all sorts of gut level thinkers to the SEM world, flogging their gut level advice ("content is king" being just one pervasive popular myth in question) and confusing each other and everybody else. This is a basically religious, moralistic attitude, and quite an inadequate one when dealing with technological issues.

So search engines are not our friends? But if you set out to use search engine generated traffic for your business model, you ought to realize that there's a generic conflict of interest being installed: you may want good rankings to achieve good returns, while the search engines couldn't care less about your turnover. All they ever want your content for is to expand their database to become more attractive to surfers. It's a number game, and as an individual webmaster you're always being shortchanged: if your business goes belly up, the search engines will simply feature someone else on their SERPs without wasting one thought on you. After all, they have billions of other pages to choose from.

Final quote: "Take your risks if you must - and don't complain if you happen to lose. Rather, pick up the shards and try anew. And if you should really find this business too nerve racking, maybe you'd be better off doing something else in the first place."

Credits to Peter Da Vanzo's Search Engine Blog Interview with Ralph Tegtmeier.

posted rustybrick in Cloaking / IP Delivery at March 10, 2004 8:19 PM Comments (0)

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here