Saturday, April 17

NESCAI 2010 Keynote: Knowledge Representation and Reasoning for Web Search

I'm at NESCAI here at UMass this weekend. I'm liveblogging the keynote talk by Ron Brachman.

Some opportunities for Knowledge Representation and Reasoning in Web Search and Advertising

by Ron Brachman

Background on Knowledge Representation
Dartmouth – 1956
- Summer research project on AI
- “it may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture. From this point of view forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others.” – mccarthy, minsky, crew

Fundamental idea
- A good part of thought involves a store of represented beliefs and procedures that operate on them
- … to produce new beliefs
- … that impact (decisions on) what to do
- This was a radical idea that put the focus on the knowledge- what can be known, and how, and what follows from what is known
- Knowledge – its representation and procedures for manipulating it then becomes a first class object of study

The Field
- Inferences = computation of entailments
- Complexity of reasonsing and its practical ramifications

The Internet: Challenges on the web
- An overview of publishers -> aggregators/intermediaries -> consumers and searchers
- Supported by advertising

Why this is interesting
- Massive amounts of data
-- most of it unstructured
-- some metadata, micro-formats, "deep web"
-- Abundance of statistically significant behaviors

-Massive numbers of people
-- social connectivity

- Multi-way ecosystem
-- Users, advertisers, publishers (plus aggregators, facilitators)

- Huge opportunities: Web search is still crude

Two key scientific areas on the web

- "Finding" science and systems
(Not necessarily pure IR... DARPA direct brain implant experiments)
-- web-scale IR - 30 B pages, 3 keywords, a fraction of a second
-- statisfaction of rich, varied user needs
-- proactive information supply
-- "social search"
- Computational advertising
-- Multi-factor matching challenges
-- Microeconomic challenges

Moving Beyond "Search"
- First generation - "on page" text data

- Second generation - use of off-page, web-specific data
-- link connectivity, PageRank, anchor text
-- Clickthrough data (what results people click on)

-Third generation - answer "the need behind the query"
-- semantic analysis
-- Integrate multiple sources of data (universal search)
-- Context determination (spatial, previous queries, user profile)
-- Help the user (UI, spell checking query refinement..)
-- integration of search and text analysis

User Needs
- Broder's taxonomy of queries (informational, navigational, transactional)

Third Generation Search
- Focus on the user need and answer it directly
- e.g. "I want to book a vacation in Tuscany"
- (mentioned vertical search and specialized "application" websites to accomplish your task)
- Rich search results for restaurants, hotels, etc...

Third Generation Search Challenges
- How to better take advantage of Structure
-- Structure - "deep web" - hidden behind DB interfaces
-- Semi-structured information (use ML to determine structure)
-- Structure of the world

(Still out there: How do you connect the model of extracted structured information to a model of world knowledge.)

- User Intent
- Context

- Core "algorithmic" search: still a long way to go. Largely offline ML at scale. From an AI perspective, How can we harness background knowledge to improve learning?

From IR to Information Supply
- Explicit demand for information driven by a user to active information supply
- Some simple examples: news alerts, automatic linking of content

The Power of Social media
- How can we turn "folksonomies" into usable KR structures? ... and do reasoning over them?

Computational Advertising Challenges
- match ads to users
- use taxonomy of queries and of ads to match intent

Huge opportunities for knowledge representation and reasoning... information extraction, matching, representation and reasoning with user intent.

Fundamental challenges
- scale
- latency
- stastical vs. knowledge context


  1. Once again, I have to scratch my head when I hear things like this:

    'Third generation - answer "the need behind the query"'

    Is he kidding? IR/search has *always* been about answering the need behind the query. The need behind the query is first generation search. It's been talked about that way since the 1960s. It's only in recent years, with the advent of web search and the ease with which we can use clickthrough data as a substitute for real Information Retrieval, that we've forgotten that fact.

  2. The "generations of search" characterization is a bit artificial and somewhat inaccurate. However, it does very roughly reflect the evolution of web search engines. After all, this was a Y! web-centric talk.