Friday, August 20

GraphLab: Beyond MapReduce for Parallel Machine Learning

A team at the CMU Select Lab recently released a new software package, called GraphLab that provides an alternative to the MapReduce paradigm for developing Machine Learning algorithms. The work is described in the paper,

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein (2010). "GraphLab: A New Parallel Framework for Machine Learning." Conference on Uncertainty in Artificial Intelligence (UAI). (PPT slides)

From the description on the website,
GraphLab provides a similar analog to the Map in the form of an Update Function. The Update Function however, is able to read and modify overlapping sets of data... In addition the update functions can be recursively triggered with one update function spawning the application of update functions to other vertices in the graph enabling dynamic iterative computation...

The GraphLab analog to Reduce is the Sync Operation. The Sync Operation also provides the ability to perform reductions in the background while other computation is running. Like the update function sync operations can look at multiple records simultaneously providing the ability to operate on larger dependent contexts.
Other than the paper, you can read the details page more information.

I need to think about this more.