Tuesday, July 21

SIGIR Keynote: From networks to human biology by Albert-Laszlo Barabasi

Here are my raw notes from the keynote address.

Update: My Flickr album includes photos from the keynote. I'm behind on labeling, but they're the cool looking network images.

He will talk about all kinds of networks. When thinking about large networks most people think of the Internet, a system of hardware routers and cables.

Structure of an organization
Another network: The social network. Each node is a person. The color is a department and the links are communications between them.

Understanding the structure of the social network, what advantage does that give the company? (www.orgnet.com) The social network defines how a company performs, or how successful a person is in pursuing their goals.

The Genome
Humans have 25,000 genes about three times as many genes as the fly. They have less than the onion, which has 40,000. So how do we explain the complexity differences in organisms? Human complexity seems unlikely to come from a sheer quantity of genes. There is no correlation between the number of genes and intelligence. It has to do with the internal network and how the genes are connected.

What makes a system complex?
Complex Systems are made of many non-identical elements connected by diverse interactions.

How do we model and analyze networks?
Pal Erdos (1913 – 1996) A Hungarian mathematician who lived out of his suitcase, moved from place to place. He stayed only as long as it took to wear out his welcome, which was usually about 5 days. Then he moved on. This made him one of the most networked academics of his time…

Erdos-Renyi model
  • A network is simple: nodes and links
  • Generate a network that has the complexity of the real network. (democratic, random) Connect two nodes with a probability, p.
Animation: Start with nodes floating in space, and you connect them randomly. Start by connecting pairs. Eventually you get a big glob – a magical moment. Results in a Poisson distribution

A poisson distribution. If it modeled how a society formed (friends), we would all have roughly the same number of friends. It would be very democratic. Do we believe that real networks are random? Intuitively the answer is no. We're here because there is order (not random). We need to find the signature of the non-randomness.

They selected the WWW has a model of a large graph.
Around a trillion documents. They expected a degree of randomness, but when they looked a the data they found a power law distribution. There are a small number of nodes with large numbers of links. They compared it with the road network; it's very different. It's much more like the hub system (scale-free network) used by airlines.

Internet backbone too is a scale-free network. We’re talking about just the backbone here, as distinguished from the Internet as a whole.

He also showed a similar trend in scientific co-authorship: Nodes are people and links are working relationships. Most people have have a few collaborators. A few have many. The story is similar with academic citations, i.e. how papers refer to each other.

The same trend is shown in the Swedish sex-web (nodes: females;males and the links are sexual relationships). Liljeros et al. Nature 2001. They were looking at how AIDs spreads and they saw the same underlying scale-free network.
  • Why do all these networks converge to a similar architecture?
Erdos-Renyi made the assumption that as you’re placing the links, the number of nodes don’t change, which is different from the networks we’re observing.

Origin of SF networks: Growth and preferential attachment.
  1. Networks continuously expand by the addition of new nodes.
  2. Growth add a new node with m links. New nodes prefer to connect to highly connected nodes. Your biased towards more connected nodes (preferential attachment).
These two properties sufficient and necessary for the emergence of scale-free networks. Both of these properties are essential ingredients, although there is much more going in real networks (remove nodes, change links, etc...).

See A.-L. Barabasi, R. Albert and H. Jeong, Physica A 272, 173 (1999)

Fitness Model: Can Latecomers Make it?
- Scale-Free Model: (first mover advantage)
- How can you cure the model to account for the latecomers? Each node has a fitness, it's a fuzzy quantity. It tells you how attractive a node is. What is the likelihood that you will connect to the person once exposed?

See Bianconi & Barabasi, Physical Review Letters 2001; Europhys. Lett. 2001.

With fitness, you may have a latecomer take over the whole network. You don’t typically have this with a basic scale-free network w/out the fitness property.

These are manmade networks. Is this a result of what we do, or is there something more fundamental?

- Protein-gene network (genome), protein-protein interactions (proteome), bio-chemical reacctions (metabolism).

- Is it only yeast and bacteria, or do we carry scale-free networks? Rual et al. Nature 2005; Stelze et al. Cell 2005. Yes, we do. So what?

  • The property of complex systems maintain their basic functions even under errors and failures. (animals function despite cell mutations; Internet functions despite broken routers).
  • In a network, what happens when there are node failures? As we remove nodes the network will start breaking apart. What is the fraction of the largest connected component? The critical point where too many nodes have been removed can be calculated.
  • As you remove nodes, the size of the critical point will only happen when fc=1.
  • In a scale-free network most nodes are small: so if they are chosen randomly it's likely you'll choose a node that is not highly connected (hubs are rare).
  • However, if you start at the largest node and remove in order, that within 10-15 nodes the network is shattered. (attack is its Achilles heel). (now from structure to behavior...)

When do things happen?
- Basic assumption: Poisson process: events take place randomly at a constant rate. Is this a good model? Do we really follow this?

In the case of e-mail the pattern is bursty; periods of inactivity and intense activity. It's more of a power law (long tail) distribution of events than a constant Poisson model. The same patterns emerge for library usage, web browsing behavior, document printing, cell phone calls, etc... This means you'll have a burst of behavior followed by a pause with sparse events interspersed.

Prioritized Task Execution
Where does this come from? How do we decide what to do next? Imagine you have a to-do list. Do you throw a die and decide what to do? Uh, no. Most of us assign priorities. We deem some things to be important, and others less so.

Imagine a model where you prioritize each task. You do the highest priority task you can right now, and when you finish you have an opening. Meanwhile you may have gotten another task on your todo list, which you prioritize. Repeat. So how long do things stay on your todo list? (execute a random task and a highest priority task with a prob of 1-p) This explains the power law distribution. Just the fact that you prioritize things creates burstiness and the power law distribution.

Q: Does the length of the queue matter? No. the distribution is the same.
- Just the fact that you prioritize creates the bursty pattern.

See Vazquez, PRL 95, 248701 (2005)

Universality Classes
- Is there life beyond alpha=1?
- The letters of Einstein, Darwin. (pre-email)
Einstein sent 14,500 letters, received 16,200.
Darwin sent 7591 letters and received 6530.
Graphs of number of letters per year, and response time for letters look very similar.
- The response time for their letters follow the same power law distribution (with a=3/2)

Cobham Model 1954. They were overwhelmed with the number of letters and could not keep up.

Thinking about networks: Is it just a fad?
Lorenz’s paper on Chaos (1963) In 1970’s citations took off and today its perhaps the most referenced paper on complexity. [He showed us other graphs on the reference growth of popular papers over time.]

The three most cited network papers’ citations are off the charts. Nothing like it has ever been seen before in the history of academic literature. What’s going on?

What is “network science”?
An attempt to understand networks emerging in nature, technology and society using a unified set of tools and principles.

He displayed a graph of how network science citations have exploded over the last decade. This is due to interdisciplinary recognition of this as a phenomenon. The study of this as a science he argues is due to the fact that "networks emerge and evolve driven by a fundamental set of laws and mechanisms".

Each field has adopted network science as its own, and thinks they invented it.

It was noted that these types of networks are vulnerable to attacks; they are not the safest. However, it's more likely that they represent a trade-off between efficiency and safety (my interpretation). The most efficient network is a single hub with spokes. The chance of bringing down this network by random chance is very small; only 1/n. The power law network has a relatively small number of these hubs.

Jeremy asked about the implications that the network structure has for the discoverability of information. Namely, that it will be harder to find information from non-hub sources. The response was that on the Internet:
"Everybody has a voice, but not everybody is heard."
He added in a follow-up that Google will be hard to beat by Bing and other competitors because it has such a large lead. Their fitness rate has to be significantly higher than Google, which means they need to be significantly better/different (... or use their desktop monopoly, my thought).

1 comment:

  1. Great, thanks for the summary. Do note however Mathematics and the Internet: A Source of Enormous Confusion and Great Potential for a sceptical look at the application of scale-free networks in one particular domain, namely, the modelling of the physical routing infrastructure of the internet.