Twitter Revamps Search Engine Backend

Oct 7, 2010 • 7:50 am | comments (3) by twitter Google+ | Filed Under Social Search Engines & Optimization
 

Twitter announced they have "launched a new backend for search on twitter.com." In short, they moved from the original Summarize technology they bought years ago to a infrastructure and system that is completely new, home grown.

Tedster at WebmasterWorld pulls out the key differences:

  • Twitter's real-time search engine was, until very recently, based on the technology that Summize originally developed.
  • [Now we have] a new, modern search architecture based on a highly efficient inverted index instead of a relational database.
  • With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines.
  • We estimate that we're only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get!

Regarding the 1 billion queries per day, they are not human searches. I strongly recommend you read Danny's piece on that.

Twitter said they chose Lucene, a search engine library written in Java, as a starting point. But not without modifications, things Twitter changed include significantly improved garbage collection performance, lock-free data structures and algorithms, posting lists, that are traversable in reverse order and efficient early query termination.

Forum discussion at WebmasterWorld.

Previous story: More Google Properties Get Instant
 
blog comments powered by Disqus