Is there any official Google statement regarding that search result on one's own site ought to be disallowed from indexing (e.g. via robots.txt)?
He cited an example of YouTube search results (a search page at YouTube) coming up in Google Web search results.
Over the weekend, Kevin Gibbons started a thread at Search Engine Roundtable Forums asking why do Google Maps results show up as indexed in the Google search results. His example search was bondi beach via Google UK, that shows this page from Google Maps listed in the search results. As you can see from the robots.txt file, Google disallows crawling of that file. So why is this page listed in the Google results? Well, it is not really listed. Someone must of linked to the page, so Google has the linkage data in the search results, which is normal on some searches.
Back to the YouTube case, Google's Matt Cutts said "YouTube added a “Disallow: /results” line in its robots.txt file." And Vanessa Fox has added a line to the Google Webmaster Guidelines stating:
Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.
So there it is. Google does not want to show their own properties search results in the Google search results. Nor do they want your search results to come up, if they don't add value. So I suspect, I should disallow searchinternal.html from this site to show up, because at least one result has been indexed.
In some cases, site navigational elements are in a sense a search feature. You have ways to filter your results for your products, using links to do so. Let me give you an example. Footlocker.com has a Men's Basketball sneaker section. I can then refine my search by filtering down, via link navigation, to Men's Nike Basketball sneakers, I can then refine it a bit more by selecting the $50 - $74.99 price range. Are those the type of search results Google is talking about? I doubt it, because they add value.
But what about my example above of showing all my posts via internal search feature for reputation management is that bad? I found it useful, so I linked to it. Maybe other's looking for online reputation management rather see several examples from this site, as opposed to just one? I think in this case, Google would rather show one or two results and offer the "More results from www.seroundtable.com »" link, if they deem it necessary. So in this case, I should really disallow that page. But I have yet to add a robots.txt file, so maybe one day. :)
Danny Sullivan also has a great post at Search Engine Land on this topic from this morning. Danny, in part, looks at Shopping Search Engines and should Google index the search results page of shopping engines, when they don't index their own Froogle results.