Twitter Revamps Search Engine Backend

Oct 7, 2010 • 7:50 am | comments (3) by twitter Google+ | Filed Under Social Search Engines & Optimization
 

Twitter announced they have "launched a new backend for search on twitter.com." In short, they moved from the original Summarize technology they bought years ago to a infrastructure and system that is completely new, home grown.

Tedster at WebmasterWorld pulls out the key differences:

  • Twitter's real-time search engine was, until very recently, based on the technology that Summize originally developed.
  • [Now we have] a new, modern search architecture based on a highly efficient inverted index instead of a relational database.
  • With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines.
  • We estimate that we're only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get!

Regarding the 1 billion queries per day, they are not human searches. I strongly recommend you read Danny's piece on that.

Twitter said they chose Lucene, a search engine library written in Java, as a starting point. But not without modifications, things Twitter changed include significantly improved garbage collection performance, lock-free data structures and algorithms, posting lists, that are traversable in reverse order and efficient early query termination.

Forum discussion at WebmasterWorld.

Previous story: More Google Properties Get Instant
 

Comments:

Ilan

10/07/2010 02:02 pm

Lucene has a port in C (called CLucene). It sounds like they didn't use that, though. They do say that they're contributing the changes back to the open-source community in a new "realtime" branch. That's cool! I was going to use Lucene in an upcoming project, and I'm sure some of their contribution would benefit me as well.

Michael Martinez

10/07/2010 04:11 pm

Trying to draw distinctions between human and machine searches is fruitless. Even the major search engines are being hammered by scrapers that ignore the APIs.

Barry Schwartz

10/07/2010 04:17 pm

No, not just about scrapers, but also about APIs and so on.

blog comments powered by Disqus