Friday, January 22

Bing's Stefan Weitz on Microsoft's Vertical Strategy

SearchEngineLand has an interview with Stefan Weitz, a Director at Bing Search. The interview highlights some of Bing's future directions and challenges in search. One emphasis is on better supporting user's tasks. Stefan says,
Relevancy is relative. It is about the intent of the user, first of all. What is the user trying to do? Then, secondly, what do you know about the user or the query that could help to better refine the results?
I disagree with the above statement. By definition, a document is relevant if it satisfies a user's "information need". I think he's really trying to say that too often we make the mistake of removing the user from the equation and creating a universal relevance judgment that holds across all users who issue a query.

The interview goes on to talk about how Microsoft is investing in technologies to support complex decision making, in vertical categories like travel, health, and shopping. He highlights Farecast. However, it's also clear by the current Bing results for plan a trip to Florida, that there is still a long way to go.

The article goes on to detail Microsoft's "vertical" strategy as a means of differentiation:
We will continue to introduce these verticals, in pretty short order, frankly. The sum of those parts will become a very differentiated experience that will expand how people think about search...
If you missed it, yesterday Microsoft started rolling out Bing Recipe Search.

Thursday, January 21

Google examines synonym effectiveness in query expansion

Google has used synonyms for query expansion for several years now. It is part of their attempt to find what you mean, not just what you type. Steven Baker, an engineering on the quality team wrote a post covering a recent examination of synonym usage in query expansion. He writes,
...our measurements show that synonyms affect 70 percent of user searches across the more than 100 languages Google supports. We took a set of these queries and analyzed how precise the synonyms were, and were happy with the results: For every 50 queries where synonyms significantly improved the search results, we had only one truly bad synonym.
Another tidbit is that Google is expanding their highlighting of synonyms in search result summaries.

Lastly, a tip if you get stuck with one the 1 in 50 queries where synonyms go bad:
You can also turn off a synonym for a specific term by adding a "+" before it or by putting the words in quotation marks.
Bill Slawski has good coverage of the post, and previous work on synonym usage, including Steven's patent, Determining query term synonyms within query context.

Tuesday, January 19

Challenges and Opportunities in Search

I'm a bit behind here because of the upcoming SIGIR deadline. However, I wanted to make sure to mention an article in the January CACM, New Search Challenges and Opportunities. The article highlights three main directions:
1) Web-scale information extraction
2) Real-time search: blogs and status updates
3) Task-based search: Time and location

Web-scale information extraction
In the first section, they highlight Oren Etzioni's work on TextRunner. It's an interesting project, but it was published in 2008. If you're interested in more recent and in-depth work I suggest reading the students' theses: Michael Cafarella, Extracting and Managing Structured Web Data and Michele Banko, Open Information Extraction for the Web. From Michael's introduction,
TextRunner is an extractor for processing natural language Web text. WebTables extracts and provides applications on top of relations in HTML tables. Finally, Octopus provides integration services over extracted Structured Web data. Together, these three systems demonstrate that managing structured data on the Web is possible today,and also suggest directions for future systems.
The work on integration with Octopus was recently published, Data Integration for the Relational Web.

Blogs and real-time search
I thought that this section didn't add useful discussion over what was previously discussed at length in other forums. I particularly think there was little useful discussion on Twiter and status updates. One thing of note was that Susan Dumais' comment on the challenge of opinion analysis in blogs:
But rating postings as positive or negative, or figuring out whether they're aimed at an older or younger audience or have a left-leaning, right-leaning, or middle-of-the-road viewpoint, is challenging, she says.
A key challenge here is that simple term based algorithms do not capture meaning in complex discourse.

Task based search: Utilizing time and location
Jon Kleinberg highlights the need to integrate tighter with user applications,
The real issue with a search engine is not just to serve up results, but to help people accomplish what they're trying to do...
They discuss it mainly in the context of mobile search and utilizing a user's location to help better identify search intent, an obvious evolution.

A few useful reminders of trends over the past few years, but nothing particularly new.