Monday, February 26

Personalized PageRank at Google

Hi, my apologies for the long delay between posts. If you haven't noticed, my blogging frequency has declined greatly since I started. Some would blame it on getting married ;-). That has been a reason, but I have also been quite busy at work and I have been burnt out.

In my spare time, I've taken to watching Heroes and playing with Google Reader.

I have had posts to myself accumulating in my inbox, so I guess the blogging bug is starting again after burning out.

I'll resume with this short piece I found the other day, an interview with Marissa Mayer on personalized search at Google. He writes for Search Engine Land, which I have taken to reading and quite enjoy. Here are the interesting parts of the article:
We acquired a very talented team in March of 2003 from Kaltix. It was a group of three students from Stanford doing their Ph.D, headed up by a guy named Sep Kamvar, who is the fellow who cosigned the post with me to the blog. Sep and his team did a lot of PageRank style work at Stanford. Interestingly enough, one of the papers they produced was on how to compute PageRank faster...

Interestingly enough, the reason they were interested in building a faster version of PageRank was because what they wanted to do was be able to build a PageRank for each user. So, based on seed data on which pages were important to you, and what pages you seemed to visit often, re-computing PageRank values based on that. PageRank as an algorithm is very sensitive to the seed pages. And so, what they were doing, was that they had figured out a way to sort by host and as a result of sorting by host, be able to compute PageRank in a much more computationally efficient way to make it feasible to compute a PageRank per user, or as a vector of values that are different from the base PageRank...

We acquired them in 2003 and we've worked for some time since to outfit our production system to be capable of doing that computation and holding a vector for each user in parallel to the base computation... So if you have a site about baseball you can say you want to base it on these three of your favorite baseball sites and have a search box that has a PageRank that's veered in that direction for baseball queries.

And on how personalized results are integrated into Google results today:
The actual implementation of personalized search is that as many as two pages of content, that are personalized to you, could be lifted onto the first page and I believe they never displace the first result, in our current substantiation, because that's a level of relevance that we feel comfortable with. So right now, at least eight of the results on your first page will be generic, vanilla Google results for that query and only up to two of them will be results from the personalized algorithm.

Finally, an interesting note on query refinements vs personalization:
When you look at the overall utility, probably 1 to 5% of people will click those query refinements on any given search, where most users, probably more than two thirds of users, end up using one of our [personalized] results. So in terms of utility and value that is delivered to the end user, the search results themselves and personalizing those are an order of magnitude more impactful then personalizing a query refinement.

No comments:

Post a Comment