Tuesday, October 25

CIKM 2011 Keynote: David Karger

Creating User Interfaces that Entice People to Manage Better Information
By David Karger (MIT)

HayStack - Per user Information Environments (1999)

Current State of IKM (Information and Knowledge Management)
  - We take users with extremely rich landscapes of information and we give them keyboards to barely sketch their interested.  Algorithms work really really hard on that sketch.  

 - We work hard to make computers do IKM well
 - People are better than computers at IKM

  - In what ways can we give people the ability to manage more or better information?
 - How do we make them want to?

1) Capture more data digitally
2) Collaborate to understand lecture notes

Capture of Information Scraps  
 - The state of PIM
 - The desks all have computers, but we have huge piles of paper (never put into it)
 - 27 participants, 5 Orgs, 1 hour in situ interviews
#1 using computer is distracting / impossible
  -- people instead just grab random notes to write things down
  -- Interfaces for Staying in the Flow (Ben Bederson, Ubiquity 2004
  -- (Being "in the zone', in the flow)

#2 chimeras fight between apps
  -- Meeting notes with TODOs and follow up meetings
#3 Diverse information forms don't fit apps
Types of information
  TODOs, meeting Notes, Name and Contact information

#4 Want in view at right time --workflow integration

Costs to digital capture
 - costs: effort to choose place, imposted schema, entry time is a distraction
Fixes: no organization, plain text, in the browser, cross-computer sync offline+online

list.it (open source mico note tool for Firefox.
 --> 25,000 downloads, 16,625 registered users, 920 volunteers, 116k contributed notes

Types of notes:
TODOs, Web bookmarks, Concat information
median time to write something is 7.4s
median number of lines is 4

35% - ease/speed
20% simplicity
20% direct replacement for post-its

Detour: Note Science
  -- How do people keep and acccess information in list-it?

3 coders
first clustered, identified 4 archetypes

MISC - MIT Open Scrap Corpus (available online)

NB: Classroom Discussion
Stellar Classroom discussion tool
 - 50 most active classes made 3275 posts
  -- no heavily populated posts
- Nb: forum in context  (happen in the margin of lecture notes)
 - Highlight a section of the post, write a comments
 --> Implicit context (how do I get 3 from 1)

Benefits - Discuss as you read without existing note view
 -- Context is clear because the PDF content is there
 -- annotations create a heat map of lecture notes

15 classes, 4 different universities
(Annotation required), usage of the tool doubled over the term.
 --> they liked seeing that they weren't the only one that was confused.
  --> rich interaction

NB specific benefits
 --> "Why?"
 --> The social benefits outweighted the use of paper

 - Artificial Collaborative Filtering

Vast amounts of content, how do we get the good stuff
machine learning recommenders - users rate what they read, content recommendation, collaborative filtering (find people with similar likes, predict what they will like)

 - have to read lots of junk to train system
 - have to spend energy now for future benefit
 - many users won't ever get started

 - ML algorithms imperfect
 - Deliver reading irrelevant content, worry about what is missed

Alternative: People

Email is dominant in information sharing
Median 6 - people do want more relevant links
Sharers are reluctant to spam their friends
 (unsure of relevance, may have seen it already, too much effort)

-> let them use email, reasssure sender that content is relevant.  Aand that the recipient isn't overloaded. One-click sharing

Firefox plugin
1. recoomend recipients to reduce time and effort for sharing
 (uses ML to find people to recommend)

One-click thanks

Recommendation Algorithm
 -- rochio classifier

 - two week study for $30
 - 60 google reader users recruited on blogs
 - Viewed 85k posts, shared 713 posts
 - Significant increase in sharing

Recipients were happy - 80.4% of the posts contain novel content

Recommendations Useful

Do overload indicators help
 - 1/3 of subjects with them said they were favorite feature
 - 30 of shares resulted a thanks

Machine filtering
 - have to read stuff

Structured Data
We all know structured data is good.
it supports
 -> rich vizualizations, filtering, sorting, queries, merge data

Epicious (old version)
 -> filter by ingredient, cuisine, part of meal

Mere mortals just write text or HTML

Structured data takes skill
 - design a data model,

Plain authors are left behind
 -> less power to communication effectively

Coping: Information Extraction
 - Entity Recognition, Coference, relationship extraction

Imperfect, so errors creep in.

Alternative: Give regular people tools that let people author structured data
 -> to communicate well

Do we need this? Yes.

- HTML is the language of the web
 - Extend it to talk about data
 - Anyone authoring HTML should be able to author data and interactive visualization
- Edit data-html in web, blogs, wikis

(like spreadsheets)

Publishing data is easy, just put a spreadsheet online.  rows are items, columns are properties

 Items (recipes)
 - Each has properties, Title, source magainze, publication date, etc...
 - Vizualization - a collection of a view of data items
     -- bar chart, sortable list, map, thumbnail set

Bound to peroperties
 - sort by property

Facets for filtering information
 -> specificy a property, user clicks to select
 -> templates -> format per item.
 - HTML with "fill in the blanks"

Key primitives of a data page
Data - a spreadsheet

Exhibit javascript library

1800 websites using exhibits
hobby stores, science
(lots of strange hobbyists)
Veggie guide to Glasgow

Not very scalable (fast for < 100 items)

Side effects - the data is out there.  (structured data is the side effect)

Datapress - data visualization inside the blog

- People are powerful information managers
In each case, it's about giving people the tools to be information managers

Wait, There's more
 --> manage structured data by making it look like a spreadsheet
--> Atomate -> help users translate incoming data data into structured data

We work hard to make computers do IKM well,
Don't assume people are passive IK consumers
Give people tools that can encourage active engagement in IKM

All the links are at haystack.csail.mit.edu/blog

The success of exhibit came from why HeyStack didn't succeed.  It's not the only measure of success that lots of people use a tool.  It's still an interesting piece of research.