Twitter Revamps Search Engine Backend

Oct 7, 2010 - 7:50 am 3 — by Barry Schwartz

Filed Under Social Search

Twitter announced they have "launched a new backend for search on twitter.com." In short, they moved from the original Summarize technology they bought years ago to a infrastructure and system that is completely new, home grown.

Tedster at WebmasterWorld pulls out the key differences:

Twitter's real-time search engine was, until very recently, based on the technology that Summize originally developed.
[Now we have] a new, modern search architecture based on a highly efficient inverted index instead of a relational database.
With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines.
We estimate that we're only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get!

Regarding the 1 billion queries per day, they are not human searches. I strongly recommend you read Danny's piece on that.

Twitter said they chose Lucene, a search engine library written in Java, as a starting point. But not without modifications, things Twitter changed include significantly improved garbage collection performance, lock-free data structures and algorithms, posting lists, that are traversable in reverse order and efficient early query termination.

Forum discussion at WebmasterWorld.

Twitter Revamps Search Engine Backend

Popular Categories

The Pulse of the search community

Search Video Recaps

Most Recent Articles

Google March 2024 Core Update Finished April 19th (A Week Ago)

Daily Search Forum Recap: April 26, 2024

Search News Buzz Video Recap: Google Core Update Updates, Site Reputation Abuse Coming, Links, Ads & More

Google Publisher Center No Longer Allows Adding Publications

Google Tests Placing The Snippet Date Next To URL

Google Breaks Out Googlebot IP Ranges For User-Triggered Fetchers