Thursday, July 16

Upcoming SIGIR 2009 Workshop Papers Available, Some Highlights

Update: See also my follow-up post on William Webber's HPP post.

The conference isn't until next week, but some of the papers from the workshops are already available. Here's a sneak peek at what to expect.

All the workshops look interesting, but I'm most keen on:

Redundancy, Diversity, and Interdependent Document Relevance
The program looks quite strong. This is a hot topic, especially with the upcoming TREC diversity task in this year's web track.

Search in Social Media

What really interests me here is the collaborative search panel and the Twitter keynote by Abdur Chowhdury.

The Future of IR Evaluation workshop
I expect this to be a very popular and interesting workshop. I'm particularly interested in the focus on task and goal oriented evaluation. The papers are available including the final proceedings.

Here are notable evaluation papers that piqued my interest:

Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment
...turkers not only are accurate in assessing relevance but in some cases were more precise than the original experts. Also, turkers tend to agree slightly more with the experts when the document is relevant, and less when it is not relevant.
Relative Significance is Insufficient: Baselines Matter Too
Our longitudinal survey of published IR results in SIGIR and CIKM proceedings from 1998–2008 has revealed that ad-hoc retrieval does not appear to have measurably improved.
Has there been no significant progress in the last decade? This is provocative.

A Model for Evaluation of Interactive Information Retrieval
In this paper, the authors define a new metric, "usefuleness" that takes a goal/task oriented approach and measures a system against how well it assists the user in accomplishing the goal.

A Plan for Making Information Retrieval Evaluation Synonymous with Human Performance Prediction
I'm quite excited by this paper. A brief excerpt:
We propose a TREC track or other group effort that will collect a large amount of human usage data on search tasks and then measure participating sites' ability to develop models that predict human performance given the usage data.
The Large-Scale Distributed Systems for IR workshop also has its papers and proceedings available for download. A few paper highlights:

Comparing Distributed Indexing: To MapReduce or Not?
The guys at Glasgow implemented and benchmarked different distributed indexing strategies using Hadoop. They test the original MapReduce indexing strategy outlined by Dean and Ghemawat, the Nutch strategy, and their own method that they use for Terrier that adapts the non-distributed single-pass strategy.

The Curse of Zipf and Limits to Parallelization:A Look at the Stragglers Problem in MapReduce
Another MapReduce paper by Jimmy Lin that looks at the impact on stragglers on running time. A good lesson to keep in mind when developing MR algorithms.

Finally, the Learning to Rank Workshop has some its papers available, but I haven't been able to take a look yet.

No comments:

Post a Comment