Fernando Diaz, a recent CIIR alumnus has the first post: Blogs, queries, corpora. He's continuing the discussion that Iadh started on tasks for the TREC 2009 blog track (see my earlier post in response). Fernando focuses on the origins of the current TREC tasks and deriving future tasks from the behavior of real-world users of blog search engines. Fernando writes,
One question I hope will be resolved in the comments is where these query types came from. Are they derived from actual blog searchers?... One approach would be to inspect query logs to blog search engines for different retrieval scenarios and then improve performance for those scenarios.He poses a very good question. I don't recall seeing any published research analyzing the behavior of users with blog search log data. Ultimately, the problem comes back to a fundamental issue that academia struggles to try and create relevant and realistic test scenarios without access to log data from real-world systems. However, hopefully we can at least try to improve what we have today.
I would like to see TREC topics begin to model the interactive nature of search. A starting pointing is acknowledging that users enter multiple queries in order to find information. Today, TREC topics are only a single query, which is unrealistic and overly simplistic. As a starting point, I advocate the development of multi-query topics developed from query refinement chains. Evaluation would be performed on each query in the chain and the results for the query chain combined. Thoughts?