Thursday, October 8

Eric Schmidt: Disruptive Innovations In Search Not Possible

Last week BusinessWeek had a series of articles interviewing Google Search Quality leaders. In the interview with CEO Eric Schmidt, there was a question about innovation:
The days when you can come in with some new idea and change everything are gone. It's a much more sophisticated set of problems than can be done with a small team coming up with a new development.
Instead, he says that disruptive ideas will focus on a smaller part of the system, e.g. a new important ranking feature that will be assimilated into the massive behometh of a system. One example of this that comes to mind is Sep Kamvar's work on personalized Pagerank at Kaltix that Google acquired in 2003 and has now integrated.

As Eric also mentions in the interview, a key obstacle to web search innovators is scale. In economics terms the market has a large barrier to entry.

Despite the barriers, I think he is wrong. There are still disruptive innovations left in search.

Wednesday, October 7

HCIR 2009: Proceedings Highlights

Daniel points out that the HCIR 2009 workshop proceedings are available. Here are a few highlights:
  • Modeling Searcher Frustration by Henry Feild
    Henry is a labmate who recently conducted an interesting study. He conducted a user study to analyze the affective mental state of the user during search tasks in order to detect 'frustration'. The goal is then to try and predict when a user is frustrated based on observable query log data. He has some interesting results:

    1) Users who get frustrated tend to stay frustrated
    2) Frustration tends to increase with the number of queries submitted
    3) Certain users are more predisposed to being frustrated than others
    4) Frustration levels depend on the type of task

  • Using Twitter to Assess Information Needs: Early Results by Max Wilson
    They analyze 189,000 tweets collected 100 results for 10 search queries hourly over a two week period. Their goal was to understand the kinds of things people are looking for.
  • I Come Not to Bury Cranfield, but to Praise It by Ellen Voorhees
    She argues that the very simplified (impoverished) role of the user in Cranfield is necessary in order to run highly controlled experiments. A key challenge is the cost of judging results. She says,
    Modifications as small as moving from MAP to a more user-focused measure like precision at ten documents retrieved require larger topic sets for a similar level of confidence. More radical departures will require even larger topic sets.
  • Freebase Cubed: Text-based Collection Queries for Large, Richly Interconnected Data Sets by David Huynh, creator of Parallax.
    David explores some of the challenges presenting faceted interfaces across large, heterogenous domain models. He writes,
    Any large data set such as Freebase that contains a large number of types and properties accumulated over actual use rather than fixed at design time poses challenges to designing easy-to-use faceted browsers. This is because the faceted browser cannot be tuned with domain knowledge at design time, but must operate in a generic manner, and thus become unwieldy.
  • Usefulness as the Criterion for Evaluation of Interactive Information Retrieval by Michael Cole, et al. from Belkin's group at Rutgers.

    The paper argues that pure relevance based measures fail to measure whether or not a system helped a user accomplish their task. They propose a method to measure 'usefulness'.
    ... usefulness judgment can be explicitly related to the perceived contribution of the judged object or process to progress towards satisfying the leading goal or a goal on the way. In contrast to relevance, a judgment of usefulness can be made of a result or a process, rather than only to the content of an information object. It also applies to all scales of an interaction.
  • Towards Timed Predictions of Human Performance for Interactive Information Retrieval Evaluation by Mark Smucker

    He advocates an extension of the Cranfield paradigm that measures the user's ability to find relevant documents within a timed environment. The overall goal is to develop of a model of user behavior in order to inform decisions about what UI and search features provide the most opportunity for improvement. They use GOMS to estimate the time for users to complete a task given an interface. He writes,
    The acronym GOMS stands for Goals, Operators, Methods, and Selections. In simple terms, GOMS is about finding the sequence of operations on a user interface that allows the user to achieve the user’s goal in the shortest amount of time.
That's all for now, although there is a lot more interesting work in the proceedings!

Tuesday, October 6

Stalk My Semester: Information Retrieval and Stats

You may have noticed my post frequency decreasing. It's inversely proportional to the amount of homework. This semester I am taking three classes, all of which relate to my research interests (for a nice change).

CS646: Information retrieval
The graduate IR class with James. The slides from the class are available for you to follow along. For texts we're using:
  1. B. Croft, D. Metzler, and T. Strohman, Search Engines: Information Retrieval in Practice. Addison Wesley, February 2009. [amazon]

  2. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge University Press, 2008. [cup]. The authors also make it available online.
This is the first time there have ever been good comprehensive texts for IR. I recommend you pick them up if it's your area of interest.

STAT 607: Mathematical Statistics I
This is an introductory class on statistical theory taught by Michael Lavine. We're learning R for data analysis. The textbook, Introduction to Statistical Thought is available for free download. It has lots of good R examples.

CS791: Information Retrieval Seminar on User Modeling
Bruce is leading a seminar on User Modeling for IR. Last week we focused on query term weighting, led by Michael. This week we'll cover Query Reformulation techniques. There is no website or text for this course, but I'll try to provide some links to relevant papers and presentations as we cover material.