Monday, August 10

Hadoop Summit Video Roundup

Yahoo! has posted several new videos from the Hadoop summit held in June. Here's a roundup with links to the videos posted so far:
  • State of Hadoop
    Owen O'Malley, Eric Baldeschwieler, and Yahoo!'s Hadoop team talk about their work with Hadoop over the last year, including core capabilities and related sub-projects, deployment experiences, and future directions.

  • HBase Goes RealTime
    HBase is a storage system that's built on top of HDFS. The guiding philosophy of their release: to unjava-fy everything. Some of the major changes: new key format, new file format (HFile), new query API, new result API and optimized serialization, new scanner abstractions, and new concurrent LRU block cache.

  • Hive
    In this talk, Namit Jain and Zheng Shao discuss how and why Facebook uses Hive. They present Hive's progress and roadmap and describe how the open source community can contribute to the evolution of March 2008 the service was generating about 1TB per day in March 2008; in mid-2009, data production had increased to 10TB per day.

  • Hadoop Futures Panel
    Yahoo!'s Sanjay Radia discusses backwards compatibility and the future of HDFS; Owen O'Malley covers MapReduce and security futures; Doug Cutting, the father of Hadoop, talks about Avro, a serialization system; Cloudera's Tom White discusses tools and usability; Facebook's Joydeep Sen Sama talks about Hive; and Yahoo!'s Alan Gates looks at Pig, SQL, and metadata.

  • Scaling Hadoop for multi-core and highly threaded Systems
    Here they present the basic architecture of CMT (chip multi-threading) processors, designed by Sun for maximum throughput, and then describe the work the team did using Hadoop and other virtualization technologies to help scale CMT.

  • Running Hadoop in the Cloud by Tom White
    He opens with a discussion of the Berkeley RAD Lab paper on cloud computing and walks us through a set of definitions to a discussion of the public cloud. He sees a realm of interesting possibilities: an apparently infinite resource; the elimination of user commitment; and the pay-as you go model, which enables elasticity. Tom describes the implementation of Hadoop in this landscape.

  • Amazon Elastic MapReduce
    Amazon Web Services (AWS) evangelist Jinesh Varia presents Amazon's Elastic MapReduce, a web service that simplifies the complexity of large-scale data processing operations for a growing ecosystem of AWS users.

  • The Growing Hadoop Community
    Cloudera co-founder Christophe Bisciglia takes a detailed look at the growth and evolution of Hadoop technology and community over the past year.

1 comment:

  1. Anonymous11:07 AM EDT

    Thank you for the links. Quite useful.