Friday, February 11

WSDM 2011 - Best paper awards

Best paper award: Unbiased Offline Evaluation of Contextual bandit based News Article Recommendation Algorithms by Lihong Li, Wei Chu, John Langford and Xaunhui Wang

Best student paper award: Correcting for Missing Data in Information Cascades by Eldar Sadikov, Montserrat Medina, Jure Leskovec and Hector Garcia-Molina

Thursday, February 10

"Bing Dialogue Model"by Harry Shum - Second WSDM 2011 Keynote

The 2nd keynote talk at WSDM 2011 was an intriguing peak at the Bing's model of user intent, by Harry Shum, VP of Search Product Development, Microsoft.

* Challenges at launch
* Google market share has been steadily growing from 2005-2008 (Bing launch)
* Google is a consumer brand and a habit

* Bing gained 5.1% query traffic share (worldwide) since launch

* 3 elements of Search Quality
1) Relevance
  • Ranking based on meaning not keywords
  • Direct answers
2) Speed
  • Reduce effort to complete tasks
  • Direct answers
  • Fewer clicks
3) Ease of Use (User experience)
  • Intuitive query interface
  • Relevance is hard

* Demo of Bing features

1) “Quick access” – surfacing customer service phone # for query {delta airline}

2) Enhanced movie results for query that match movie titles )
* Rent, buy, watch online, reviews, posters

3) Microsoft Academic Search
* Faceted interface: filter results by author, venue
* Summary pages for author information
* Academic activity
* Co-author graph
* Disambiguation of author names

4) Summary of important information for queries that match geo-locations
* Weather
* Overview of tourist destinations
* Maps

5) Parsing natural language queries
* Parse the query {flight to Taipei feb 12 returning feb 13} to provide fast access to Bing Travel

6) Music search
* Enhanced results for queries that match musician names
* Preview songs, lyrics, bio

7) Facebook Integration
* Results that were liked by Facebook friends
* Surface Facebook profiles in searches of matching friend names

* Internet searchers are becoming more Task Centric
* Decision Making: 66% people are using search to make decisions
* Top search tasks: Entertainment, Games, Health, Travel, Shopping, Directions,…

* Tasks are becoming more sophisticated
* Longer queries
* Longer sessions

* 10 blue links are no longer sufficient
* Instead there’s a need for organized “Whole Page” experience
* Search Paradigm Shift
* From “hit or miss” model to “dialogue” model
* Understanding query intent
* Incorporating structured data into search results
* Relevance on the session level
* Minimize the effort to complete task

* Bing Dialogue Model (see the image above)

* Four levels of dialogue

1) Query level
* Query auto-completion
* Spelling correction
* Interaction with user in mobile devices with touch screens

2) Document level
* Title, snippet, deep links presentation
* Extended document preview on hover over the result

3) Page level
* Quick Tabs for relevant verticals
* Entity-based result summary
* Algorithmic results
* Related query suggestions
* Search history

4) Session level
* History-aware results comparisons

Wednesday, February 9

"Mining Billion-node Graphs: Patterns, Generators and Tools" Christos Faloutsos - First WSDM 2011 Keynote

Christos Faloutsos, Professor at Dept. of Computer Science, CMU, gave an excellent keynote this morning on pattern mining in large scale graphs. Below is a brief summary of this keynote.

· Why should we care about graphs?

o Internet Map

o Predators network in nature

o Social networks

· Patterns in Graphs

o What is normal/abnormal?

o Which patterns hold in large datasets?

· Types of patterns

o Graphs are *not* uniformly distributed

o Power law in the degree distribution

o Power law in the eigenvalues of the adjacency matrices

o Triangle law. (Triangle is a three node clique in the graph).

§ Power law in the distribution of triangles in graphs

§ #triangles = 1/6*SUM(EV^3) (EV= Eigenvalue of the adjacency matrix of the graphs)

· Can be used to speed up computation over large graphs

o EigenSpokes

§ Principal component analysis shows that many large scale graphs are usually loosely connected tight communities

· Patterns on weighted graphs

o Weights are super linear in the in-degree of the graph

· Time evolution in graphs

o Surprisingly, graph diameter shrinks over time as the graph grows

o NODES(t+1)=2*NODES(t)

o EDGES(t+1) = 1.6*NODES(t+1)

o Node popularity over time (eg. visits to blog posts)

§ Drop off in popularity is power law with exponent 1.6

o Duration of tasks (eg., duration of phone calls)

§ The longer the task has taken so far, the longer it is expected to take in the future

§ Similar to log-logistic distribution in survivability analysis

· Tools for anomaly detection in graphs

o Egonet of the node –neighbors of the node and the edges between them

§ Examining the features of the egonets can be used to detect anomalous nodes

o Belief propagation can be used for fraud detection

· Scalability

o Map-Reduce for graphs

§ PEGASUS – tool for parallelizing computation of various graph characteristics for large graphs (using Hadoop)

· Graph Radius

o 14 for a web graph of 1.4B nodes

o Many loosely connected tight communities

· Connected components

o Giant connected component ~ 1B nodes

o Many disconnected nodes

o Many suspicious connected components (link spam)

Monday, February 7

WSDM 2011 coverage

As I am attending WSDM 2011 in Hong Kong to present a paper, I will be taking over Jeff's blog for the next week. Expect posts about invited talks by Christos Faloutsos and Harry Shum, best paper award updates and more.

Comment if you have any wishes or questions.

Michael Bendersky