Wednesday, June 10

Hadoop Summit Coverage: State of Hadoop

Presented by Owen O'Malley and Eric Baldeschwieler (Yahoo!)

Owen gave a brief overview of the history of Hadoop and the Hadoop ecosystem.

Yahoo! Hadoop distribution

The big news is that on stage Eric announced the release of the Yahoo! distribution, the distribution that Yahoo! uses internally. It's not new software, but a re-packaging of publicly available releases that have been tested internally on Yahoo!'s clusters. They're starting with the 0.20 release of Hadoop as an Alpha release and this will continue to grow and stabilize.

Yahoo now has dozen's of clusters with 25,000+ nodes. The largest cluster has about 4,000 nodes.

Contains content and metadata that powers Yahoo! search.

In 2008
  • 70 hours runtime
  • 300 TB shuffling
  • 200 TB output
In 2009
  • 73 hours
  • 490 TB shuffling
  • 280 TB output
  • 55%+ hardware
Cluster stats
In 2008
  • 2000 nodes
  • 6 PB raw disk
  • 16 TB RAM
  • 16k CPUs
In 2009
  • 4000 nodes
  • 16 PB disk
  • 64 TB RAM
  • 32k CPUs (40% faster cpus)
Major features coming to Hadoop
  • Backwards compatibility (0.21 will make the last big API changes)
  • Append, sync, and flush support
  • Scheduling - Capacity and Fairshare
  • Continuous integration - easier to build and test Hadoop distributions
Pig - "Make pigs fly"
  • Support for SQL and metadata
  • Column oriented storage access layer (a new column-oriented storage view, not services like HBase)
  • Multi-query optimizations
Oozie is a new workflow and scheduling system

There was a question about cluster management. Eric recommend that people use Chukwa and Ganglia as open-source tools for large-scale cluster management.


  1. Hi admin, i went through article. As we all know, .net is most popular programming language and it offer huge career prospects for talented professionals. It’s totally awesome, keep on updating your blog with such awesome information. .Net Training in Chennai

  2. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    Python Training in Chennai|Python Training Institutes in Chennai

  3. Thanks for sharing this informative blog. If you are interested in taking .net in professional carrier visit this website.Dot Net Training in Chennai

  4. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    salesforce training in chennai | salesforce training institute in chennai

  5. Really informative post for the candidates who seeking this precious information.
    Salesforce Training in Chennai|Salesforce Training