Tuesday, March 9

TREC Entity Track 2009 into 2010

Krisztian posted a link to the TREC 2009 Entity Track Overview, part of the TREC 2009 proceedings.

The track website has information on the 2009 track and what is planned for 2010. One change they are seeking discussion about is a new semantic entity search subtask:
We propose a semantic entity search subtask for 2010: return URIs of related entities, instead of their homepages. We are planning to enrich topics with URIs of the input entities. URIs need to come from a predefined set of semantic data sources (which will include DBPedia and Freebase, at least).
The plan is to use the full category A set of ClueWeb09, which has 500 M English web pages instead of the smaller B subset which doesn't contain many entity homepages.

Data-Intensive Text Processing with MapReduce Updated Book Draft

An updated draft of the upcoming book, Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer is available.

The book isn't finished, but it still has interesting material. It emphasizes algorithms for processing text with Mapreduce: co-occurrence analysis, inverted index construction, and the EM algorithm applied to estimating parameters in HMMs.

You can also see Jimmy's cloud computing course (spring 2010) and the Ivory search engine.