Friday, July 24

Ivory: A New MapReduce Indexing and Retrieval System

To coincide with the SIGIR MapReduce Tutorial, Jimmy Lin announced the release of Ivory, an open-source MapReduce search platform. It is a web-scale indexing and retrieval system built on top of Hadoop. Since it's based on Hadoop, it's clearly written in Java.

For retrieval it uses Don Metzler's Searching using Markov Random Fields (SMRF) Java implementation. You can read his publications on the topic. It's exciting to finally get a chance to play with the implementation of one of the state-of-the-art retrieval tools. To my knowledge this is the first time Don's Java MRF toolkit for retrieval is available to the public.

Ivory is aimed at IR researchers as a platform for experimentation. This is an early release with a lot of rough edges.

Jimmy is using Ivory to index the ClueWeb09 dataset, which has 500 million English documents for the TREC web track.

No comments:

Post a Comment