Wednesday, February 9

"Mining Billion-node Graphs: Patterns, Generators and Tools" Christos Faloutsos - First WSDM 2011 Keynote

Christos Faloutsos, Professor at Dept. of Computer Science, CMU, gave an excellent keynote this morning on pattern mining in large scale graphs. Below is a brief summary of this keynote.

· Why should we care about graphs?

o Internet Map

o Predators network in nature

o Social networks

· Patterns in Graphs

o What is normal/abnormal?

o Which patterns hold in large datasets?

· Types of patterns

o Graphs are *not* uniformly distributed

o Power law in the degree distribution

o Power law in the eigenvalues of the adjacency matrices

o Triangle law. (Triangle is a three node clique in the graph).

§ Power law in the distribution of triangles in graphs

§ #triangles = 1/6*SUM(EV^3) (EV= Eigenvalue of the adjacency matrix of the graphs)

· Can be used to speed up computation over large graphs

o EigenSpokes

§ Principal component analysis shows that many large scale graphs are usually loosely connected tight communities

· Patterns on weighted graphs

o Weights are super linear in the in-degree of the graph

· Time evolution in graphs

o Surprisingly, graph diameter shrinks over time as the graph grows

o NODES(t+1)=2*NODES(t)

o EDGES(t+1) = 1.6*NODES(t+1)

o Node popularity over time (eg. visits to blog posts)

§ Drop off in popularity is power law with exponent 1.6

o Duration of tasks (eg., duration of phone calls)

§ The longer the task has taken so far, the longer it is expected to take in the future

§ Similar to log-logistic distribution in survivability analysis

· Tools for anomaly detection in graphs

o Egonet of the node –neighbors of the node and the edges between them

§ Examining the features of the egonets can be used to detect anomalous nodes

o Belief propagation can be used for fraud detection

· Scalability

o Map-Reduce for graphs

§ PEGASUS – tool for parallelizing computation of various graph characteristics for large graphs (using Hadoop)

· Graph Radius

o 14 for a web graph of 1.4B nodes

o Many loosely connected tight communities

· Connected components

o Giant connected component ~ 1B nodes

o Many disconnected nodes

o Many suspicious connected components (link spam)

No comments:

Post a Comment