Twitter announced they have "launched a new backend for search on twitter.com." In short, they moved from the original Summarize technology they bought years ago to a infrastructure and system that is completely new, home grown.
Tedster at WebmasterWorld pulls out the key differences:
- Twitter's real-time search engine was, until very recently, based on the technology that Summize originally developed.
- [Now we have] a new, modern search architecture based on a highly efficient inverted index instead of a relational database.
- With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines.
- We estimate that we're only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get!
Regarding the 1 billion queries per day, they are not human searches. I strongly recommend you read Danny's piece on that.
Twitter said they chose Lucene, a search engine library written in Java, as a starting point. But not without modifications, things Twitter changed include significantly improved garbage collection performance, lock-free data structures and algorithms, posting lists, that are traversable in reverse order and efficient early query termination.
Forum discussion at WebmasterWorld.

Comments:
Ilan
10/07/2010 02:02 pm
Lucene has a port in C (called CLucene). It sounds like they didn't use that, though. They do say that they're contributing the changes back to the open-source community in a new "realtime" branch. That's cool! I was going to use Lucene in an upcoming project, and I'm sure some of their contribution would benefit me as well.
Michael Martinez
10/07/2010 04:11 pm
Trying to draw distinctions between human and machine searches is fruitless. Even the major search engines are being hammered by scrapers that ignore the APIs.
Barry Schwartz
10/07/2010 04:17 pm
No, not just about scrapers, but also about APIs and so on.