Friday, March 13

Lucene in Action Second Edition and Lucene 2.4.1

Grant Ingersoll, points out that Lucene in Action, 2nd edition is now available for early access. Read his post for a link to get 40% the early access edition.

The first edition was a venerable standby that is now quite dated, so I look forward to reading the new edition. I will have a review up soon.

In related news, Lucene 2.4.1 was released earlier this week. Lucene 2.4 was a major update released in October and 2.4.1 contains critical bug fixes.

Tuesday, March 10

Jeff Dean WSDM 2009 Keynote Slides online

Special thanks to Jaap Kamps the note letting me know that the Jeff Dean WSDM 2009 keynote slides are online.

The presentation gives a fantastic overview of the evolution of Google's search infrastructure. It's an exciting presentation with detail on query serving infrastructure and index format (see slides 49-64).

Update: The video is now on videolectures.

Live Labs Releases Image Preference Data

Jon Elsas highlighted Live Labs' release of an image preference dataset. They have a paper describing their research in an upcoming WWW paper, Learning Consensus Opinion: Mining Data from a Labeling Game. The preference dataset contains 427 queries. The data was collected using an ESP-like game,
The Picture This game randomly pairs two users in a game setting. The game selects queries, then the players are shown two or more images. They then decide which image is most relevant to the query.
If you're just getting started with preference data, you should check out Ben Carterette's recent publications. As part of his thesis he created a preference test collection for data from the TREC 2003 web track.

ICWSM Accepted Papers

The upcoming International Conference on Weblogs and Social Media (ICWSM) posted the accepted papers. It's a good preview of what the conference holds.

Wolfram Alpha: Computational Knowledge Engine for the Web

In case you were living under a rock this weekend, or in a library studying for midterms like myself, the big news was Wolfram Alpha. Founder Stephen Wolfram introduced it on his blog. It's essentially a question answering (QA) system built on top of the Mathematica engine.

Nova Spivack, CEO of Radar Networks (creator of Twine) started, or at least fueled the hype when he wrote Wolfram Alpha is Coming -- And It Could be as Important as Google. He also did a writeup on Twine that details a demo Stephen gave him. Nova then wrote an article as a guest author on TechCrunch, which really started the hype. In the article Nova gives an overview of how it works:
Wolfram’s team manually entered, and in some cases automatically pulled in, masses of raw factual data about various fields of knowledge, plus models and algorithms for doing computations with the data. By building all of this in a modular fashion on top of the Mathematica engine, they have built a system that is able to actually do computations over vast data sets representing real-world knowledge. More importantly, it enables anyone to easily construct their own computations — simply by asking questions.
I'm skeptical, but I look forward saying more after I am able to give it a try.