Tuesday, July 20

SIGIR 2010 Keynote Address: Refactoring Search by Gary Flake

SIGIR 2010 coverage is starting. You can also follow the coverage on Twitter, #sigir2010. Here are the raw notes from the first keynote address.

Refactoring Search by Gary Flake
- aka Zoomable UIs, IR, and the Uncanny Valley
- Bing search meets Pivot.

- 50 gigs of scans of the seattle intelligencer, 600 dpi
- a proof of concept.

- take raw data; combine it with metadata for faceted navigation
- A look at Census data on death. A very novel way of navigating the dataset
- Between search and browsing.

Web Search Retrospective

What's worked well:
- instant answers
- spell correction
- vertical tabs
- query suggestions
- query completion
- grouping results

- The biggest improvement is in overall index scale
- Some improvement in core relevance
- But, this list is actually pretty modest

What hasn't worked as well
- Natural language queries
- richer representations for results
- richer presentation for one result
- clustering (visual or otherwise)

A lack of fluidity is part of the problem

Grokker (RIP, 2009)
- The sexiest search experience that no one was going use.

Instead of discreet shifts from one query to the next, can we make it a more fluid interactive process.

Uncanny Valley
- As you increase the sophistication it becomes mre pleasing until it becomes "too real" and then they feel like zombies.

Discrete vs. Continuous Interactions
CGI: stick figures -> The Simpsons -> Toy Story -> Polar Express -> Avatar
UIs: text terminal -> web 1.0 -> Rich client -> over ambitious ajax -> Good Zoomable UIs
Search: Grep -> Altavista -> present day search engines -> Grokker -> ???

Surpassing the unvacnny valley is exceedingly difficult because it requires excellence in science, technology, and.

Our dilemmas
- We are already familiar with the dilemma of precision and recall.
- There exists a similar dilemma around scale, fluidity, and complexity.

Zoomable UIs and Similarities to IR

DoopZoom items
* each tile is an image file
* each level is a set of image files in a folder
* each pyramid is a set of folders with image tiles for each level

DoopZoom collections
* thumbnails are packed onto shared tiles
* loading 100s of images requires loading few tiles.
* very simple: hierarchical file structure with XML description and metadata (no db)
- The fidelity of the experience is independent from the size of the object.
- The trick on the back end is to build the pyramid on the backend

The net outcome is that the user feels in control. "It's like having super human powers" to change levels of

Why user control is essential
- they feel empowered to explore
- Actions are more clearly invertible

Lessons from ZUIs to apply to IR
- Preprocess on the backend
- Assume the front end can do a lot
- Build UI around continuous interactions
- Use asynchronous I/O between endpoints
- Use the two in combination to reinforce one another (left versus right brain)

Higher level goals
- Turn the present discrete mode of interaction of search into a continous dialogue
- Support fluid interactions that are powerful, informative, and fun
- Scale to thousands of items within the user / client interactions

The biggest challenge is in dynamic generation of collections on the backend

Server-side IR problems
- ranking, facet determinations
- cleaning / augmenting bipartite graph

Pivot + search architecture
- Uses the Bing API + thumbnail cache to use Pivot to explore search results
- The UI supports a novel way of analysing a larger corpus of pages across multiple queries.
-- e.g. dpreview.com is prominent across multiple different queries about camera reviews

First: do no harm
- Linear order must be obvious
- First result or instance answer is prominent
- First 4 or so items are easily visible
- Preserve title / url / description format

Next modestly improve
- handle more results: > 50
- basic n-gram extraction

Not done
- Documents classes as facets
- Document similarity as synthestic facets
- Folksonomies and community tags
- Federation and verticals

Viciious cycle of the web
- easy to create -> more people create
- More stuff created - harder to find good stuff

What's the cure?
We desperately need a mode of interaction where the whole of the data is greater than the sume of the parts.

Wisdom > knowledge > information > data

- For facets: word frequency with stop words from abstracts and titles (with just a little cleanup)

No comments:

Post a Comment