Possible Yahoo! Search Update | Main | Rumors of Google Hiring 15 Year Old Not True

Peeking Into Google Reveals More of Google's Architecture

An article at InternetNews.com from March 2nd, has Google vice president of operations and vice president of engineering, Urs Hoelzle revealing some of the "behind-the-scenes tour of Google's architecture."

Bill Slawski at Cre8asite Forums created a thread on this article named Google's architecture, Informative news story where he pulled out a couple quotes.

Google replicates the Web pages it caches by splitting them up into pieces it calls "shards." The shards are small enough that several can fit on one machine. And they're replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. The engineers call these "chunk servers."
The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words.

To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."

The article is definitely worth a read and then join the forum discussion at Cre8asite Forums.



Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!

posted rustybrick in Google Optimization at March 13, 2006 7:55 AM Comments (2)

Comments

The source for the information in this article is this video featuring Google PhD Jeff Dean:

http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?asx=mms://videosrv6.cs.washington.edu/talks/Colloquia/JDean_041021_OnDemand_100_256K_320x240.wmv

It's a Google employyee recruitment session done at the University of Washington.

The clustering demonstration begins at 35:30

It is important to note that Dr. Dean differentiates between the way search engines work TODAY and what the GOAL is. He calls the clustering search tool a Demo and a Model and the tool is prominently marked DEMO.

 

The only flaw is that the so-called "REVERSE ENGINEERING" Technology was not explained
- unless "SHARDS" are the same thing, or encompasses R.E.

 

Post a comment (Note: Can Take 120 Seconds For Your Comment To Show Up)

Do you want us to save your personal Information?

Premium Sponsors + advertise

To subscribe to the Search Engine Roundtable, click here