Tuesday, January 19

Challenges and Opportunities in Search

I'm a bit behind here because of the upcoming SIGIR deadline. However, I wanted to make sure to mention an article in the January CACM, New Search Challenges and Opportunities. The article highlights three main directions:
1) Web-scale information extraction
2) Real-time search: blogs and status updates
3) Task-based search: Time and location

Web-scale information extraction
In the first section, they highlight Oren Etzioni's work on TextRunner. It's an interesting project, but it was published in 2008. If you're interested in more recent and in-depth work I suggest reading the students' theses: Michael Cafarella, Extracting and Managing Structured Web Data and Michele Banko, Open Information Extraction for the Web. From Michael's introduction,
TextRunner is an extractor for processing natural language Web text. WebTables extracts and provides applications on top of relations in HTML tables. Finally, Octopus provides integration services over extracted Structured Web data. Together, these three systems demonstrate that managing structured data on the Web is possible today,and also suggest directions for future systems.
The work on integration with Octopus was recently published, Data Integration for the Relational Web.

Blogs and real-time search
I thought that this section didn't add useful discussion over what was previously discussed at length in other forums. I particularly think there was little useful discussion on Twiter and status updates. One thing of note was that Susan Dumais' comment on the challenge of opinion analysis in blogs:
But rating postings as positive or negative, or figuring out whether they're aimed at an older or younger audience or have a left-leaning, right-leaning, or middle-of-the-road viewpoint, is challenging, she says.
A key challenge here is that simple term based algorithms do not capture meaning in complex discourse.

Task based search: Utilizing time and location
Jon Kleinberg highlights the need to integrate tighter with user applications,
The real issue with a search engine is not just to serve up results, but to help people accomplish what they're trying to do...
They discuss it mainly in the context of mobile search and utilizing a user's location to help better identify search intent, an obvious evolution.

A few useful reminders of trends over the past few years, but nothing particularly new.

