What is Really Inside the Google Index?

Sep 9, 2004 • 10:44 am | comments (0) by twitter Google+ | Filed Under Google Search Engine
 

Ever look underneath the pillows of your sofa and find nice goodies? Yea, well you also find things you do not necessarily want to find. ;) A forum thread at Search Engine Watch questions the accuracy of the number of pages found within the Google index.

In summary the issue is as follows.

Go to Google's home page and you will see it read "Searching 4,285,199,774 web pages". That means Google has 4,285,199,774 web pages in its index. So if you do a search on any keyword, one should never find more results then 4,285,199,774, right? Well, wrong. Do a search on the at Google, and you will find 5,800,000,000 results found. How can there be a difference of 1,514,800,226 pages?

Some of the answers include; (1) Google's home page does not show a real time value for the number of pages indexed by Google. I find it hard to believe that Google wouldn't update that figure after breaking 1,000,000,000 pages. (2) Google can not break the "unsigned long integer in ANSI C to assign a unique ID" to every page indexed by Google. In that case, there are ways around breaking the four bytes long limit of 4,294,967,295. (3) Google has a supplemental index, does the 4,285,199,774 web pages include pages in the supplemental index?

I have started a new thread at Search Engine Watch named How Do the Search Indexes Work?. I am hoping to get a better understanding of the various indexes at the search engines. What they include in each index, and which numbers they use to give the users a count on the number of pages indexed.

Previous story: Danny Sullivan & Chris Sherman Resign from SEMPO
 

Comments:

No comments.

blog comments powered by Disqus