Friday, July 17

TREC: Not the right forum for interactive retrieval evaluation

See also yesterdays post on SIGIR workshop papers.

In his latest post, William Webber argues that TREC is the wrong forum for the Human Performance Prediction (HPP) evaluation proposed by Mark Smucker at the Future of Evaluation workshop. William argues that when we get away from the Cranfield paradigm, an online setup not tied to a yearly conference with real users would work better than using TREC. I think his argument has merit.

Going back to tenants of HPP evaluation:
  • Evaluation should be predictive of user performance.
  • Evaluation should concern itself with both the user interface and the underlying retrieval engine.
  • Evaluation should measure the time required for users to satisfy their information needs.
In his paper Mark proposes,
This effort would in effect create an interaction pool with possibly many participants plugging different retrieval engines into a canonical UI.
Because it involves user interaction and the interplay of the UI with accomplishing tasks it's not clear that TREC would be the best host for this. It instead favors a less formal forum with significant web development experience that could facililate rapid iteration in UI and system changes. This type of setup would also easily allow flexibility in changing what interactions were logged.

Thursday, July 16

Upcoming SIGIR 2009 Workshop Papers Available, Some Highlights

Update: See also my follow-up post on William Webber's HPP post.

The conference isn't until next week, but some of the papers from the workshops are already available. Here's a sneak peek at what to expect.

All the workshops look interesting, but I'm most keen on:

Redundancy, Diversity, and Interdependent Document Relevance
The program looks quite strong. This is a hot topic, especially with the upcoming TREC diversity task in this year's web track.

Search in Social Media

What really interests me here is the collaborative search panel and the Twitter keynote by Abdur Chowhdury.

The Future of IR Evaluation workshop
I expect this to be a very popular and interesting workshop. I'm particularly interested in the focus on task and goal oriented evaluation. The papers are available including the final proceedings.

Here are notable evaluation papers that piqued my interest:

Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment
...turkers not only are accurate in assessing relevance but in some cases were more precise than the original experts. Also, turkers tend to agree slightly more with the experts when the document is relevant, and less when it is not relevant.
Relative Significance is Insufficient: Baselines Matter Too
Our longitudinal survey of published IR results in SIGIR and CIKM proceedings from 1998–2008 has revealed that ad-hoc retrieval does not appear to have measurably improved.
Has there been no significant progress in the last decade? This is provocative.

A Model for Evaluation of Interactive Information Retrieval
In this paper, the authors define a new metric, "usefuleness" that takes a goal/task oriented approach and measures a system against how well it assists the user in accomplishing the goal.

A Plan for Making Information Retrieval Evaluation Synonymous with Human Performance Prediction
I'm quite excited by this paper. A brief excerpt:
We propose a TREC track or other group effort that will collect a large amount of human usage data on search tasks and then measure participating sites' ability to develop models that predict human performance given the usage data.
The Large-Scale Distributed Systems for IR workshop also has its papers and proceedings available for download. A few paper highlights:

Comparing Distributed Indexing: To MapReduce or Not?
The guys at Glasgow implemented and benchmarked different distributed indexing strategies using Hadoop. They test the original MapReduce indexing strategy outlined by Dean and Ghemawat, the Nutch strategy, and their own method that they use for Terrier that adapts the non-distributed single-pass strategy.

The Curse of Zipf and Limits to Parallelization:A Look at the Stragglers Problem in MapReduce
Another MapReduce paper by Jimmy Lin that looks at the impact on stragglers on running time. A good lesson to keep in mind when developing MR algorithms.

Finally, the Learning to Rank Workshop has some its papers available, but I haven't been able to take a look yet.

Wednesday, July 15

LinkedIn Jumps Onto the Faceted Search Bandwagon with a Bang

Daniel has a post covering the launch of LinkIn's new People Search Beta that uses faceted search. I like it alot! Some of the current facets are location, past school, past companies, etc... very useful information.

Tuesday, July 14

Knowing your data (and your domain) matters

Greg has an article for CACM, The biggest gains come from knowing your data. The article argues that carefully crafted features and judicious selection of your algorithms to match your data provides significant performance gains over of-the-shelf algorithms and generic features. He highlights the Netflix contest and insights into user rating behavior as a key example of this.

He also posted his summary of the Collaborative Filtering with Temporal Dynamics paper.

Monday, July 13

How to Improve your chances of getting your paper accepted (at least at KDD)

Eamonn Keogh a professor at UC Riverside gave a tutorial at KDD 2009 titled: How to do good research, get it published in SIGKDD and get it cited!. Thanks to William Webber for his summary and pointing this out to me.

The slides from the tutorial are now available online. While it was given at KDD, many of the same principles described apply to other conferences like CIKM, SIGIR, etc...

Here are a few steps that he outlines to make your paper more likely to get accepted:
  • Anchoring. Anchor your readers on the first page. This means a solid and captivating title, abstract, and introduction. Motivate your paper clearly.

  • Reproducibility. Make your experiment reproducible by telling your readers what you did and how you did it: parameters, algorithms, data pre-processing, etc... One of most common causes of unreproducible results is complexity and parameters: be explicit and try not use algorithms whose parameters someone else won't be able to understand or replicate.

  • Unjustified choices are bad. Given an explanation for every choice you made, even if it was arbitrary.

  • Choose your words carefully. Words can be confusing: optimal, proved, significant, theoretically. Be sure to define any abbreviations early!

  • Use all your space. Don't leave empty space in your paper. Use it to be show more results or give more detail.

  • Use Figures Effectively. Make good figures that clearly illustrate your point. (see the paper for examples of good and bad figures)

Read the tutorial for the Top Ten Reasons Your Paper Got Rejected. Here are a few highlights:
  1. The paper is out of scope for the conference.
  2. Not an interesting or important problem
  3. Sloppy paper: typos, unclear figures, and poor writing
  4. The experiments are not reproducible
  5. There was an easier way to do solve the problem and you did not compare against it.
I'll add: the results are not compelling. You tried lots of things and nothing worked, or you managed only a very tiny improvement over the baseline. Negative results are important, but good results are more likely to be published. This is because if your techniques don't produce results then you have to explain, in detail, why they didn't work. This takes a lot of time and effort to do thoroughly and most people don't do it well.

See also the Research Methods class taught by David Jensen here at UMass. You can see the schedule from this Spring's class.