Many SEOs use the site command to see how healthy their site is in a particular search engine. So you plug in site:www.mydomain.com in a search engine and the search engine will return the number of pages they have indexed for that domain. If you know you have a hundred pages and the search engine indexed 90% of those pages, then you are pretty well off.
But the problem is, the site command is not often all that reliable. We had recent reports that Google is dropping pages and we had recent reports that Microsoft Live Search is dropping pages as well. Most SEOs determine a drop in pages indexed by the number of results returned by the engine for a site command.
But is this a valid way of really determining how many pages a search engine indexed of your site? From what I am hearing from search engine representatives at both Google and Microsoft, the answer is no. A webmaster should not depend on the number returned by a site command as a reliable indicator of the number of pages a search engine has indexed of their site.
Googler, JohnMu, wrote in a recent Google Groups thread three reasons why SEOs and Webmasters should not depend on this number:
- The previous approximation was incorrect, the current one is closer to the actual number of URLs that we have indexed or would show to users
- The previous approximation was close and the current one is worse than before (this can happen)
- A change in our algorithms (we make a lot of changes that will impact crawling, indexing and ranking -- for some sites perhaps more than for others)
At the same time, Microsoft's Jeremiah Andrick told me that it "is problematic to use the "site:" operator to determine how many pages for a site are included in the Live Search index. The “Site:” operator generates an estimate of the pages in the index. These numbers can vary wildly depending on when you execute the query."
That being said, how can you get an accurate number of pages indexed by a search engine for your site?
I know Google's Webmaster Tools has in their Sitemaps section a place to show you the number of pages submitted in your Sitemap compared to how many URLs actually indexed. So, this might be a better indicator, but I am nervous about this number, because way too often I hear of reporting glitches in Webmaster Tools.
Another option is to track each and every keyword phrase your pages rank for. Then see by keyword, not by site command, if those pages rank. This can be time consuming, but there are ways to automate this.
Overall, using the site command might not be the best way to determine how healthy your site is in a particular search engine. I know many SEOs use this as a factor, but maybe it is time we think again about this?
Forum discussion at Google Groups.