Vanja Josifovski, Yahoo! Research
Where is user understanding going?
What is the future of the web?
- prevalent - everyone and everything
- mutual understanding
Personalized laptops
Personalization today
- Search personalization. low entropy of intent. Difficult to improve over the baseline
--> effects are small in practice
Content recommendation and ad targeting
- High entropy of intent
- Still very crude with relatively low success rates
How do we need to move to the next level
- more data, better reasoning, and scale
Data today
- searches, page views
- connections: friends, followers, and others
- tweets
The data we don't have
- jetlagged, need a run? need a pint?, worried about government debit?
- the observable state is very thin
How to get more user data?
- Only with added value to the user
- Must be motivated to provide their data
Privacy is not dead, it's hibernating
- the impact of data leaks online is relatively small
Methods
- State of the art as we know it.
- Popular that seem to work well in pratice
--> Learn relationship between features rij = xiCzj
--> Dimensionality reduction (random, topical models, recommender systems rij = uivj)
--> Use of extenral knowledge: smoothing
--> taxonomies
An elaborate user topic model (Ahmed, KDD 2011, Smola et al. VLDB 2010), yet so so simple
- the user behavior at time T is a mixture of his behavior at time t-1 + global overall behavior
- Very simple model
Using External Knowledge
- Aggrawal et all KDD2007, KDD 2010
Is there more to it?
-> What is the relative merit of the methods?
-> They use the data in the same way and are mathematically very similar
Where is the limit?
-> what is the upper bound on the performance increase on a given dataset with this family of algorithms?
Scale
- Today - MapReduce is a limiting barrier for many algorithms
- Need the right abstractions in parallel environments
- Move towards shared in memory, messages passing models (like Giraph)
-- (we'll work this out)
Workflow complexity
- the reality bites Hatch et al. CIKM 2011. Massive workflows that run for hours.
Summary
CIKM
1) Deep user understanding - the tale of three communities
IR:
- Good formalism that function practice
- emphasis on metrics and standard collections
DB
- seamless running of complex algorithms
- new parallel computation paradigms
Towards deeper understanding
1) get users to give you more data by providing value
2) significantly increase the complexity of the models
3) scale in terms of data and system complexity
-- (we'll work this out)
Workflow complexity
- the reality bites Hatch et al. CIKM 2011. Massive workflows that run for hours.
Summary
CIKM
1) Deep user understanding - the tale of three communities
IR:
- Good formalism that function practice
- emphasis on metrics and standard collections
DB
- seamless running of complex algorithms
- new parallel computation paradigms
Towards deeper understanding
1) get users to give you more data by providing value
2) significantly increase the complexity of the models
3) scale in terms of data and system complexity