Christos Faloutsos, Professor at Dept. of Computer Science, CMU, gave an excellent keynote this morning on pattern mining in large scale graphs. Below is a brief summary of this keynote.
· Why should we care about graphs?
o Internet Map
o Predators network in nature
o Social networks
· Patterns in Graphs
o What is normal/abnormal?
o Which patterns hold in large datasets?
· Types of patterns
o Graphs are *not* uniformly distributed
o Power law in the degree distribution
o Power law in the eigenvalues of the adjacency matrices
o Triangle law. (Triangle is a three node clique in the graph).
§ Power law in the distribution of triangles in graphs
§ #triangles = 1/6*SUM(EV^3) (EV= Eigenvalue of the adjacency matrix of the graphs)
· Can be used to speed up computation over large graphs
§ Principal component analysis shows that many large scale graphs are usually loosely connected tight communities
· Patterns on weighted graphs
o Weights are super linear in the in-degree of the graph
· Time evolution in graphs
o Surprisingly, graph diameter shrinks over time as the graph grows
o EDGES(t+1) = 1.6*NODES(t+1)
o Node popularity over time (eg. visits to blog posts)
§ Drop off in popularity is power law with exponent 1.6
o Duration of tasks (eg., duration of phone calls)
§ The longer the task has taken so far, the longer it is expected to take in the future
§ Similar to log-logistic distribution in survivability analysis
· Tools for anomaly detection in graphs
o Egonet of the node –neighbors of the node and the edges between them
§ Examining the features of the egonets can be used to detect anomalous nodes
o Belief propagation can be used for fraud detection
o Map-Reduce for graphs
§ PEGASUS – tool for parallelizing computation of various graph characteristics for large graphs (using Hadoop)
· Graph Radius
o 14 for a web graph of 1.4B nodes
o Many loosely connected tight communities
· Connected components
o Giant connected component ~ 1B nodes
o Many disconnected nodes
o Many suspicious connected components (link spam)