Tuesday, August 11

Google Unveils New 'Caffeine' Search Infrastructure Update

Caffeine is a top secret project to re-rewrite of Google's indexing system. It's finally being released. According to this interview with Matt, infrastructure-wise, this compares with the BigDaddy update in 2006. There have been major changed under the hood to make indexing more flexible, faster, and more robust. According to the Google post:
For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google's web search.
You can try an index served on the new archicture in the sandbox they setup to let people try it out. Notice anything different?

Matt Cutts has a post on his blog. The infrastructure team have been working hard,
...a few weeks ago, I joked that the half-life of code at Google is about six months. That means that you can write some code and when you circle back around in six months, about half of that code has been replaced with better abstractions or cleaner infrastructure...
Congratulations to the infrastructure team: I didn't notice a significant difference in the results. I expect this will help Google to significantly increase the size and freshness of their index.

You may remember Cuil. Despite getting knocked pretty hard, Cuil was not about next-generation ranking, it was about infrastructure. Read my post for details. It's not clear, but perhaps the Caffeine update tackles some of the issues that Anna Patterson, former Google infrastructure architect, recounted in a Cuil interview,
If they [Google] wanted to triple size of their index, they'd have to triple the size of every server and cluster. It's not easy or fast...increasing the index size will be 'non-trivial' exercise.

Has Google tackled these architecture issues with 'Caffeine'? We may never know.


