Thursday, April 2

Amazon Elastic Map-Reduce with Hadoop

Today Amazon announced in a blog post a new service, Elastic MapReduce.

Although people having been running Hadoop on EC2, the new service simplifies the process.
Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more Steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3.
One interesting note is that it includes the Aggregate package, which is a library that supports commonly used reduce operations: count, sum, value histograms, etc...

No comments:

Post a Comment