Thursday, April 2

Amazon Elastic Map-Reduce with Hadoop

Today Amazon announced in a blog post a new service, Elastic MapReduce.

Although people having been running Hadoop on EC2, the new service simplifies the process.
Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more Steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3.
One interesting note is that it includes the Aggregate package, which is a library that supports commonly used reduce operations: count, sum, value histograms, etc...

Tuesday, March 31

Wikia search is dead, for now

Jimmy Wales has a post on his blog where he explains that due to the current economic climate and the need for profitability, Wikia Search is ending. Although, Jimmy hints that he would like to resurrect the project during better economic conditions.

Wikia search was ambitious. It acquired the Grub crawling engine and attempted to build an open and transparent search platform on Nutch.

It's an interesting coincidence that Nutch just released version 1.0 after almost 2 years without a release. I'm glad to see the project isn't dead, I had almost written it off.

JDK 7 Significant Array Speed Gains

The guys over at Lingpipe have been testing an early version of JDK 7 with impressive results.

It appears that array access times have been significantly sped up using techniques to avoid costly array bounds checking. Read their post for the details.

Perhaps with the upcoming release we can finally put to rest the Java is slow meme.