Thursday, October 27

CIKM 2011 Industry: Toward Deep Understanding of User Behavior on the Web

Toward Deep Understanding of User Behavior on the Web
Vanja Josifovski, Yahoo! Research

Where is user understanding going?

What is the future of the web?
 - prevalent - everyone and everything
 - mutual understanding

Personalized laptops

Personalization today
 - Search personalization.  low entropy of intent.  Difficult to improve over the baseline
 --> effects are small in practice

Content recommendation and ad targeting
 - High entropy of intent
 - Still very crude with relatively low success rates

How do we need to move to the next level
 - more data, better reasoning, and scale

Data today
 - searches, page views
 - connections: friends, followers, and others
 - tweets

The data we don't have
 - jetlagged, need a run? need a pint?, worried about government debit?
 - the observable state is very thin

How to get more user data?
 - Only with added value to the user
 - Must be motivated to provide their data

Privacy is not dead, it's hibernating
 - the impact of data leaks online is relatively small

Methods
  - State of the art as we know it.
  - Popular that seem to work well in pratice
  --> Learn relationship between features rij = xiCzj
  --> Dimensionality reduction (random, topical models, recommender systems rij = uivj)
  --> Use of extenral knowledge: smoothing
      --> taxonomies

An elaborate user topic model (Ahmed, KDD 2011, Smola et al. VLDB 2010), yet so so simple
 - the user behavior at time T is a mixture of his behavior at time t-1 + global overall behavior
 - Very simple model

Using External Knowledge
 - Aggrawal et all KDD2007, KDD 2010

Is there more to it?
 -> What is the relative merit of the methods?
 -> They use the data in the same way and are mathematically very similar

Where is the limit? 
  -> what is the upper bound on the performance increase on a given dataset with this family of algorithms?

Scale
 - Today - MapReduce is a limiting barrier for many algorithms
 - Need the right abstractions in parallel environments
 - Move towards shared in memory, messages passing models (like Giraph)
 -- (we'll work this out)

Workflow complexity
 - the reality bites Hatch et al. CIKM 2011.  Massive workflows that run for hours.

Summary


CIKM
1) Deep user understanding - the tale of three communities

IR:
 - Good formalism that function practice
 - emphasis on metrics and standard collections

DB
 - seamless running of complex algorithms
 - new parallel computation paradigms


Towards deeper understanding
1) get users to give you more data by providing value
2) significantly increase the complexity of the models
3) scale in terms of data and system complexity


1 comment: