Wednesday, May 28

The most important research question in blog search

Tonight I watched On TREC Blog Track that I mentioned yesterday.
Social media isn't about search, it's about content creation and interaction. What is the role that search has to play here?
In the conclusion Ian describes possible future directions of the blog track:
  • Following a story different bloggers discuss it, track a story
  • Follow discussion of a single infamous blog post
  • Adaptive filtering of blogs
  • Automatic tagging of blog posts
To me, the first two directions sound like Topic Detection and Tracking (TDT) at different levels of granularity. Have blogs been used as a corpus for TDT tasks? I'm not too familiar with this area of research.

Another really interesting part is the Q&A. Specifically, Ian's response to the last question:
Search is a tool, not a task... Within the evaluation paradigm, I don't care how you find the ranked list of stuff... The problem of identifying the user task, what is the user trying to really do that we are abstracting and operationalizing into something you can measure in a lab setting. It's a critical question. It's something we have a standard of operationalizing; we have a standard way of making this an experiment in IR. This is how we have done search evaluation for a long time. So, we tend to try and cast problems in this way. But, one of the research questions, the most important research question is: how do you think about what people are actually doing and then how do you make this into something we can measure? This is what I am really interested in.
This sounds a lot like the discussion that Nick surfaced at ECIR, see previous discussion here and by Daniel.

One step towards measurement is to correlate search interaction log data with behavior observed in field studies. If you want to get started in this area I highly recommend "What people think about when searching" [pdf presentation, mp3 podcast] by Dan Russell from Google's search quality group, given at Marti Hearst's SIMS i141 class. It's one of the best presentations on HCIR I've heard in a long-time.


  1. Hi... glad you liked my talk.

    TO answer your first question, blogs have not been used for a topic tracking or filtering experiment yet, although with the existing TREC data and the timestamp information in the posts, it's simple to set up.

    I have to say I'm flattered to be placed in the company of Nick Belkin, but something that perhaps didn't come out is that I'm coming from a laboratory-experimental perspective on IR, whereas Nick would go look at the actual users. Both of us have real challenges -- for me, I not only want to measure these things, but I want the measurements to be repeatable, which usually means using a Cranfield-like setting. I like to think we're pushing from two directions to a common point where hopefully we'll meet.

  2. Ian,

    Thanks for the comments.

    I see the value both in what TREC has done as well as Nick's point of view.

    As you say, the challenge is to make measurements repeatable and yet allow room for the person performing the task. Are there different classes of users or at least prototypical user behavior patterns that could captured and incorporated into the topics and tasks? For example, whether the user is looking for introductory material or advanced material on a topic. To simulate interaction, perhaps a context consisting of prior queries in the session and clicked results (or other logged features).

  3. Can you review the scour search engine?

  4. Anonymous6:16 AM EST

    so the question remains (at least for me)
    can we use trec blog-corpus for TDT tasks???

    I would also like to ask if there is any other
    corpus ,expect TDT corpus and trec blog, which is free distributed
    and i can use for TDT tasks