Friday, October 24

Yahoo! Inquisitor update: new platforms and personalized results

Inquisitor is a browser plugin (originally for Safari) that provides search-as-you-type and suggested searches capabilities. Yahoo! just announced it is now available for Firefox and IE.
Building on the work by the Yahoo! Research team in the paper "Information Re-Retrieval: Repeat Queries in Yahoo! Logs," the algorithm that generates the personalized results has been enhanced to return more targeted results.
Inquisitor will also search the contents of your bookmarks to help you re-discover old content.

I've been trying it out and I really like it.

See ReadWriteWeb's coverage.

Thursday, October 23

Kleinberg launches new memetracker using Spinn3r

Kevin Burton from Spinn3r announced that Kleinberg's team at Cornell are launching a new memetracker. Check it out.

(Spinn3r also recently announced they were providing the data for the ICWSM data challenge.)

Monday, October 20

ICWSM 2009 Data Challenge

Ashkay announced that the data for the ICWSM 2009 data challenge is available.

The dataset consists of 44 million blog posts (27 GB compressed) crawled by Spinn3r between August 1st and October 1st 2008. The paper deadline is in January, so get to work!

TREC Blog Track approved for 2009

Congratulations to Iadh and team at Glasgow. Their proposal for the 2009 TREC blog track was approved.

It's exciting to see that in at least one task the relevance will include not only topical relevance, but also include the 'quality' of the content. This is one my major criticisms of the current Cranfield/TREC paradigm and most current academic experiments.

I find the blog track interesting, and not just because I have a blog. I'm interested in utilizing the highly temporal nature of blog posts to study the importance of temporal relevance. For example, to study the trade-off between authority and recency in ranking.

See also my previous discussion of the 2009 blog track.

LETOR 3.0 beta released

Microsoft Research Asia announced the beta 3.0 release of its Learning to Rank (LETOR) benchmarking platform.

You can read the announcement on the website for the full list of updates and changes. As an example, there are new document features for ranking:
In LETOR3.0 we added in-link number, out-link number, length of URL, number of slashes in the URL, etc. as new features. Also, we extracted those existing features in all streams (URL, title, anchor and body), while features in some streams are missing in LETOR2.0. Overall, there are 64 features (Table. 2) which can be directly used by learning algorithms.
Also of note, the document parser used to index the documents changed (different tf counts) and the definition of some of the document fields differs slightly from version 2.0. Furthermore, the IDF calculation changed significantly.