Monday, August 3

The future of Hadoop: Don't panic, yet

The recent MS-Y! deal has a alot of people scared about what it means for Y!'s support of Hadoop. Y! search uses Hadoop to create its "WebMap" and is the largest Hadoop customer. See my previous coverage on the State of Hadoop talk at the Hadoop summit, where the search application was one of the primary featured applications. In fact, the cost of running and continually expanding the search clusters was likely a factor in Carol's decision to stop investing in search and to reallocate resources elsewhere.

Given the importance of Hadoop as an infrastructure tool for search there is a lot of uncertainty about the future. For example, The Register wrote an article titled: Microsoft pact holds gun to stuffed elephant. To counter the uncertainty and fear Eric has a post telling people not to panic! and that Yahoo! is still very committed to using Hadoop for infrastructure.

Despite this reassurance, Hadoop is losing a big customer driving requirements and changes that make it a better platform for building search applications, unless a miracle happens and some variant is adopted by Microsoft. The loss may not have short-term impact, but will change the long-term direction of the project as it focuses on being relevant to other teams and problems that are aligned with Y!'s new goals and strategies.


  1. But MS bought Powerset and Powerset uses HBase, and HBase sits on top of Hadoop.
    So maybe the question is whether Bing uses Powerset technology, and thus, at least indirectly, HBase, and Hadoop.

  2. I don't have first-hand knowledge, but the articles I've read report that Powerset is still using it, but that there are no plans to expand its usage to power more of Bing. Hadoop/HBase usage is limited to a very small sliver of the overall setup.

  3. Microsoft already has Dryad (and also DryadLINQ), which have some nice features. Both Dryad and DryadLink are available to researchers under an Academic Release License. I heard that there is a paper in the pipeline comparing Hadoop and Dryad.