Monday, December 28

Dean and Ghemawat Strike Back on MapReduce

Jeff Dean and Sanjay Ghemawat wrote an article for the January edition of CACM, MapReduce: A Flexible Data Processing Tool. In the article, they refute the findings of A Comparison of Approaches to Large-Scale Data Analysis. On their blog, the authors also wrote a post bashing MapReduce: MapReduce, A major step backwards. The post is no longer available, but thankfully Greg had good coverage.

In the article Dean and Ghemawat address the paper and attempt debunk its claims, although they lack the benchmarks to back it up. In the process, they inform you about the right way to run M/R jobs efficiently:
  1. Avoid starting processes for each new job, reuse workers.
  2. Careful data shuffling, avoid O(M*R) disk seeks
  3. Beware of text storage formats.
  4. Use natural indices like timestamps on files.
  5. Do not merge reducer output.
They present some good M/R lessons in their refutation. You should be using a binary serialization system like Avro or Protocol Buffers and storing your data in a format that provides efficient access, using a natural file structure or using a database system like HBase.

12 comments:

  1. The "MapReduce is backwards" article moved after some time and now has completely disappeared. I suspect it has something to do with Vertica recently partnering with Cloudera and supporting Hadoop connectivity within the Vertica product. Still, doesn't make Stonebraker's writings any less unbiased.

    ReplyDelete
  2. I love your job jobs efficiently: Avoid starting processes for each new job, reuse workers.
    Careful data shuffling, avoid O(M*R) disk seeks
    Beware of text storage formats.
    Use natural indices like timestamps on files.
    Do not merge reducer output.
    It will Be very helpful for me.
    Delhi Escorts Service

    ReplyDelete
  3. Faith Fully Sweet Independent Delhi Escorts Provide Best Escorts Services in Delhi city. Meet our Delhi escorts, escorts in Delhi.
    Delhi Escorts Service
    http://www.delhihotservices.com

    ReplyDelete
  4. Day by day the market of escort is too much popular but just because of money most of the escort company offer cheap class escort services and it regret you from escort service and to make it more advanced Delhi escort added some advanced technique with full of pleasure.
    Delhi escort services

    ReplyDelete
  5. This post is very good for us. It has got a lot of benefit from us.I hope that you will be writing this post again Escorts Service In Mahipalpur

    ReplyDelete
  6. We say that you welcome in Delhi, one of the best cities in the world, which is a fully weird and great city. This is not the clerical center of India, but the people city is the main city of India. Woman from outer space click here for acting as well as for state grits
    Independent Sexy Female Delhi Escorts
    Escorts Service in Delhi
    High Profile Delhi Escorts
    High Class Delhi Escorts Service Independent
    Independent Sexy Call Girls Delhi Escorts Service

    ReplyDelete