Google Now Crawling Content Behind Forms

Apr 14, 2008 • 7:50 am | comments (8) by twitter Google+ | Filed Under Google Search Engine Optimization
 

We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event, Google now admits to Crawling through HTML forms. Here are some things to know about this announcement in bullet form:

  • For select menus, check boxes, and radio buttons on the form, Google will choose from among the values of the HTML.
  • After gaining access to content pass the form, Google may or may not index that content
  • You can block Googlebot from crawling your forms by excluding them in your robots.txt file
  • Googlebot will only attempt to crawl GET forms
  • Googlebot tries to avoid forms requesting userids, login, passwords, contact information and so on
  • This should not impact PageRank

Matt Cutts of Google explains how this meets a need of so many webmasters that are clueless to SEO. In fact, from making the web more accessible, this new crawling technique rocks. But for SEOs and webmasters who want to block Google from accessing content, it will require some change on their part. I.e. they will have to restructure some of their sites to block Googlebot from crawling their forms.

The forum reaction is very mixed. We have threads at Sphinn, DigitalPoint Forums, Search Engine Watch Forums and WebmasterWorld.

Pros: Google can crawl places they haven't and index more of your content, which gives you more visibility. Cons: Pages you do not want indexed, might require you do more work to block them.

The big joke in WebmasterWorld is that Googlebot now has a credit card. For example, if it can submit forms, maybe Googlebot will start messing around with conversions. Obviously, it won't place orders but what about submit a simple form that you consider to be a conversion (i.e. like user agreements or more)? In fact, I found GoogleBot filling out this Google Checkout form to buy itself some WD40 (kidding of course):

Googlebot Gets a Credit Card

But you get the point.

Forum discussion at Sphinn, DigitalPoint Forums, Search Engine Watch Forums and WebmasterWorld.

Previous story: Possible Google AdWords Slap on April 12, 2008?
 

Comments:

matt

04/14/2008 08:34 pm

what does this mean exactly?

Rob Abdul

04/15/2008 01:24 pm

When you markup a web page, create a form with drop down boxes and input fields, Google can index those drop down and input field, names/id tags.

Bill Kruse

04/15/2008 05:25 pm

How is this news? Years ago I used to use this as a technique, make sure there was a forms page with little content in it but related key phrases in the drop-down menu. As I recall, the page did well for the phrases. These were only extant in the drop-downs, I don't believe there were inbounds using anchors to the form. BB

Barry Schwartz

04/15/2008 05:28 pm

Bill, are you joking? This is news.

Bill Kruse

05/01/2008 01:26 pm

Ah. About a fornight later and in another context, I begin to get it. Are we talking javascripted drop-downs and menus, the formerly untraversable to Google spiders kind of drop-downs and menus? That IS different. BB

Bill Kruse

05/01/2008 01:59 pm

My immediate (if apparently belated) reaction is that Google are chipping away at my income by making aspects of what I do slowly redundant :-( Further I can no longer trust my spider-checkers to advise me on what can and can't be crawled. Google need really, in light of this, to offer a Google Spider simulator, a completely up-to-date and honest one. Fat chance, eh? I think we'll all still have to carry on as we have regarding spidering for pure site navigation - usability - in general because there's Yahoo and MSN to think of. Keeping content unindexable so your credit card details aren't immortalised by accident in The Wayback Machine, that may well have got disturbingly harder. We'll all be back to using phones for everything confidential soon. This could well be a step forward that has us fleeing back into the past. BB

No Name

06/15/2010 05:39 pm

that`s really a mess. As far as I know forms were invented to give selective content, regarding of the parameters you set to the form...checkboxes, radiobuttons etc.. so if google indexes it , he should crawl all the variants with all parameter combinations IMHO

Peter Hanson

10/06/2012 01:57 am

Let Google crawl your pages, your search boxes, your forms, even up your butt, as you need Google more than Google needs you. NEVER block Google or you WILL be very sorry that you did,

blog comments powered by Disqus