Saturday, April 17

NESCAI 2010 Keynote: Knowledge Representation and Reasoning for Web Search

I'm at NESCAI here at UMass this weekend. I'm liveblogging the keynote talk by Ron Brachman.

Some opportunities for Knowledge Representation and Reasoning in Web Search and Advertising

by Ron Brachman

Background on Knowledge Representation
Dartmouth – 1956
- Summer research project on AI
- “it may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture. From this point of view forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others.” – mccarthy, minsky, crew

Fundamental idea
- A good part of thought involves a store of represented beliefs and procedures that operate on them
- … to produce new beliefs
- … that impact (decisions on) what to do
- This was a radical idea that put the focus on the knowledge- what can be known, and how, and what follows from what is known
- Knowledge – its representation and procedures for manipulating it then becomes a first class object of study

The Field
- Inferences = computation of entailments
- Complexity of reasonsing and its practical ramifications

The Internet: Challenges on the web
- An overview of publishers -> aggregators/intermediaries -> consumers and searchers
- Supported by advertising

Why this is interesting
- Massive amounts of data
-- most of it unstructured
-- some metadata, micro-formats, "deep web"
-- Abundance of statistically significant behaviors

-Massive numbers of people
-- social connectivity

- Multi-way ecosystem
-- Users, advertisers, publishers (plus aggregators, facilitators)

- Huge opportunities: Web search is still crude

Two key scientific areas on the web

- "Finding" science and systems
(Not necessarily pure IR... DARPA direct brain implant experiments)
-- web-scale IR - 30 B pages, 3 keywords, a fraction of a second
-- statisfaction of rich, varied user needs
-- proactive information supply
-- "social search"
- Computational advertising
-- Multi-factor matching challenges
-- Microeconomic challenges

Moving Beyond "Search"
- First generation - "on page" text data

- Second generation - use of off-page, web-specific data
-- link connectivity, PageRank, anchor text
-- Clickthrough data (what results people click on)

-Third generation - answer "the need behind the query"
-- semantic analysis
-- Integrate multiple sources of data (universal search)
-- Context determination (spatial, previous queries, user profile)
-- Help the user (UI, spell checking query refinement..)
-- integration of search and text analysis

User Needs
- Broder's taxonomy of queries (informational, navigational, transactional)

Third Generation Search
- Focus on the user need and answer it directly
- e.g. "I want to book a vacation in Tuscany"
- (mentioned vertical search and specialized "application" websites to accomplish your task)
- Rich search results for restaurants, hotels, etc...

Third Generation Search Challenges
- How to better take advantage of Structure
-- Structure - "deep web" - hidden behind DB interfaces
-- Semi-structured information (use ML to determine structure)
-- Structure of the world

(Still out there: How do you connect the model of extracted structured information to a model of world knowledge.)

- User Intent
- Context

- Core "algorithmic" search: still a long way to go. Largely offline ML at scale. From an AI perspective, How can we harness background knowledge to improve learning?

From IR to Information Supply
- Explicit demand for information driven by a user to active information supply
- Some simple examples: news alerts, automatic linking of content

The Power of Social media
- How can we turn "folksonomies" into usable KR structures? ... and do reasoning over them?

Computational Advertising Challenges
- match ads to users
- use taxonomy of queries and of ads to match intent

Huge opportunities for knowledge representation and reasoning... information extraction, matching, representation and reasoning with user intent.

Fundamental challenges
- scale
- latency
- stastical vs. knowledge context

Friday, April 16

Battelle interviews on search, Siri highlighted

Gord Hotchkiss from SELand has two interviews with John Battelle on the future of search. You can read Part I and Part II.

One interesting point is the knowledge gap when making a complex decision outside your area of expertise. The problem, as John puts it is:
I don’t even know what I don’t know, and to expect search to tell you what I don’t know is expecting more than search can deliver.
Other than solving that hard problem, the interview discusses the shift towards applications (that sometimes utilize search) to solve tasks. In the interview John highlights Siri. Siri is a digital personal assistant accessed via your iPhone. From their website,
You can ask Siri to find a romantic place for dinner, tell you what’s playing at a local jazz club or get tickets to a movie for Saturday night.
Siri builds on expertise the team gained while working on the DARPA CALO project for SRI,
CALO stands for Cognitive Assistant that Learns and Organizes. The name was inspired by the Latin word calonis, “soldier’s servant,” because DARPA’s goal is to create a cognitive system that can reason, learn, and respond to surprise in order to assist in military situations.
SIRI is at the intersection of IR and agent systems. It moves search towards accomplishing possible tasks in your current context rather than simply returning search results. SIRI looks quite young, so we'll see how the vision develops. You can check out Siri's blog for more information.