We should have seen this coming, based on the number of reports that Google was submitting GET forms. But often, it is hard to validate those types of reports, due to people spoofing Googlebot and similar tactics. In any event, Google now admits to Crawling through HTML forms. Here are some things to know about this announcement in bullet form:
- For select menus, check boxes, and radio buttons on the form, Google will choose from among the values of the HTML.
- After gaining access to content pass the form, Google may or may not index that content
- You can block Googlebot from crawling your forms by excluding them in your robots.txt file
- Googlebot will only attempt to crawl GET forms
- Googlebot tries to avoid forms requesting userids, login, passwords, contact information and so on
- This should not impact PageRank
Matt Cutts of Google explains how this meets a need of so many webmasters that are clueless to SEO. In fact, from making the web more accessible, this new crawling technique rocks. But for SEOs and webmasters who want to block Google from accessing content, it will require some change on their part. I.e. they will have to restructure some of their sites to block Googlebot from crawling their forms.
The forum reaction is very mixed. We have threads at Sphinn, DigitalPoint Forums, Search Engine Watch Forums and WebmasterWorld.
Pros: Google can crawl places they haven't and index more of your content, which gives you more visibility. Cons: Pages you do not want indexed, might require you do more work to block them.
The big joke in WebmasterWorld is that Googlebot now has a credit card. For example, if it can submit forms, maybe Googlebot will start messing around with conversions. Obviously, it won't place orders but what about submit a simple form that you consider to be a conversion (i.e. like user agreements or more)? In fact, I found GoogleBot filling out this Google Checkout form to buy itself some WD40 (kidding of course):
But you get the point.
Forum discussion at Sphinn, DigitalPoint Forums, Search Engine Watch Forums and WebmasterWorld.


Comments:
matt
04/14/2008 08:34 pm
what does this mean exactly?
Rob Abdul
04/15/2008 01:24 pm
When you markup a web page, create a form with drop down boxes and input fields, Google can index those drop down and input field, names/id tags.
Bill Kruse
04/15/2008 05:25 pm
How is this news? Years ago I used to use this as a technique, make sure there was a forms page with little content in it but related key phrases in the drop-down menu. As I recall, the page did well for the phrases. These were only extant in the drop-downs, I don't believe there were inbounds using anchors to the form. BB
Barry Schwartz
04/15/2008 05:28 pm
Bill, are you joking? This is news.
Bill Kruse
05/01/2008 01:26 pm
Ah. About a fornight later and in another context, I begin to get it. Are we talking javascripted drop-downs and menus, the formerly untraversable to Google spiders kind of drop-downs and menus? That IS different. BB
Bill Kruse
05/01/2008 01:59 pm
My immediate (if apparently belated) reaction is that Google are chipping away at my income by making aspects of what I do slowly redundant :-( Further I can no longer trust my spider-checkers to advise me on what can and can't be crawled. Google need really, in light of this, to offer a Google Spider simulator, a completely up-to-date and honest one. Fat chance, eh? I think we'll all still have to carry on as we have regarding spidering for pure site navigation - usability - in general because there's Yahoo and MSN to think of. Keeping content unindexable so your credit card details aren't immortalised by accident in The Wayback Machine, that may well have got disturbingly harder. We'll all be back to using phones for everything confidential soon. This could well be a step forward that has us fleeing back into the past. BB
No Name
06/15/2010 05:39 pm
that`s really a mess. As far as I know forms were invented to give selective content, regarding of the parameters you set to the form...checkboxes, radiobuttons etc.. so if google indexes it , he should crawl all the variants with all parameter combinations IMHO
Peter Hanson
10/06/2012 01:57 am
Let Google crawl your pages, your search boxes, your forms, even up your butt, as you need Google more than Google needs you. NEVER block Google or you WILL be very sorry that you did,