A Google Webmaster Help thread has some interesting details from Google's John Mueller about content removal from Google's index and/or search results.
Some of these points you already know but every SEO and webmaster should understand these. Heck, some are even eye opening to me.
Here are the raw points John made and then I'll share what I think is revealing:
- The URL removal tool is not meant to be used for normal site maintenance like this. This is part of the reason why we have a limit there.
- The URL removal tool does not remove URLs from the index, it removes them from our search results. The difference is subtle, but it's a part of the reason why you don't see those submissions affect the indexed URL count.
- The robots.txt file doesn't remove content from our index, but since we won't be able to recrawl it and see the content there, those URLs are generally not as visible in search anymore.
- In order to remove the content from our index, we need to be able to crawl it, and we should see a noindex robots meta tag, or a 404/410 HTTP result code (or a redirect, etc). In order to crawl it, the URL needs to be "not disallowed" by the robots.txt file.
- We generally treat 404 the same as 410, with a tiny difference in that 410 URLs usually don't need to be confirmed by recrawling, so they end up being removed from the index a tiny bit faster. In practice, the difference is not critical, but if you have the ability to use a 410 for content that's really removed, that's a good practice.
I find the 404 versus 410 point very interesting. With a 404 result code, Google will typically recrawl to verify the page is really not found. But if you serve up a 410, Google may not need to recrawl to verify the page is not there. This is an important thing for webmasters to know. It is safer to go with a 404 but seems quicker to go with a 410.
The second item is that Google said the URL removal tool does not remove URLs from the index, it removes them from the Google search results. Many know this, but it is important to point out as well.
Forum discussion at Google Webmaster Help.