Thursday, April 10

Nick Belkin ECIR 2008 Keynote: Some(what) Grand Challenges in Information Retrieval

Nick Belkin, professor at Rutgers gave the opening keynote at ECIR. One of the primary themes was the lack of emphasis on the user's role in Information Retrieval. Here are my notes from his talk.

Lack of user focus
  • none of the papers at ECIR, etc… have the word user in the title
  • This an example of the tenor of mainstream IR research. Important people (i.e. Karen Sparck Jones) indicated that the operating conditions of real people with real systems have been neglected.
  • IR is really about helping people find information. We need to consider motivations, intentions, background information, current task at hand, etc…
  • While some progress has been made, the TREC (Cranfield) evaluation methodology, this setting largely ignores users and interactive retrieval
This leads to problems such as:
  • New ranking models and techniques are limited to incremental improvements
  • Improvements in TREC-style evaluation performance rarely leads to improvement in interactive systems
Challenges for the field
  1. Goals, tasks and intentions. These affect judgments of usefulness and the way they want to interact with information objects.
    • The ability to differentiate between these in principled ways that can be applied across many contexts and this should lead to design principles.
    • Ways of inferring context better. Most work in this field has been explicit definition of tasks.
    • Systems that effectively respond to different needs

  2. Interdisciplinary research barriers. Significant barriers in the structure of research organizations to perform the interdisciplinary work that is necessary

  3. Understanding in interactions other than specified search. For example, people have difficulty specifying their query. (the most basic example is that people search and browse). We need to be able to model the way search systems are used in the context of the other activities being conducted.

  4. Characterizing context – what is not context? A recent panel discussion led to the conclusion was that everything is context. There is no principled way in identifying what factors are essential. We need ways to identify these features without explicitly asking for them from users.

  5. Taking account of user affect and Emotion. The mainstream IR and interactive retrieval has been focused on efficiency and effectiveness of the system. The emotions that people experience while interacting with information can affect the way they continue the experience. Making IR pleasurable is as important a goal as making it fast, and efficient. For example, learning is enhanced by uncertainty instead of well-being. Facilitating the idea that something significant is being explored. This work requires collaboration to make this happen.

  6. Personalization. Most work has been done as a way of using relevance feedback with past behavioral evidence (click-through, viewed documents, dwell time, current documents). Most work has been done with respect to other documents, but not anything beyond that. The way results are displayed – different times, different places, tailoring it to different cognitive or development levels.

  7. Integration of IR in a task environment. People want to accomplish some other task. Search is a distraction, an ultimate challenge is to arrange things so that they never have to have separate interaction to find what they need. People shouldn’t have to leave their tasks, but none of such efforts have left experimental phase or have been evaluated. It requires deep understanding of the tasks that people perform. This requires collaboration with the application developers. [this reminds me of search as the ‘information layer’ that KSJ spoke of]

  8. Evaluation paradigms, especially interactive. TREC is unsuitable for evaluation of interactive search. The alternatives have severe limitations. The main problem is reproducibility. What evaluation measure are useful here? We have not been successful with the desire for realism and the need for scientific method. One key is the interaction data (i.e. from large web search engines), integrating log data with selective investigations of user behavior and studies. Privacy is a problem here, see AOL 2006. The general research community may never really have access to this. A possible alternative are standards for what data should be collected during interactions, and stored for re-use in test collections. This is expensive.

  9. Parenthesis in formal models in IR. The formal models suffer from the problems from identified by Sparck Jones. There is no room for the user or interaction, only matching and ranking techniques.
Summary
The problem is to address the interactive model of IR. We need models that incorporate the user into the model as a central process, not just as a feedback mechanism. We may have to give up strict and formal for real and useful models of IR.

Wednesday, April 9

Daniel Tunkelang's The Noisy Channel Blog

Daniel, Chief Scientist at Endeca, has started a blog called The Noisy Channel.

I met Daniel at SIGIR 2006 and we got to know each other better at ECIR. I convinced him he should start a blog. It didn't take much effort, he has a lot to say.

Daniel first two posts cover the keynote addresses at ECIR:

Some(what) Grand Challenges for Information Retrieval
Nick Belkin, Rutgers University

Web Search: Challenges and Directions
Amit Singhal, Google

I'll try to write-up some of my thoughts on they keynotes and other highlights from ECIR later in the week.

ICWSM Papers online

The ICWSM 2008 papers are online.

Via Matthew Hurst.