The Spock Entity Resolution challenge and other miscellany

The Spock Contest - via O'Reilly Radar. Spock is a new people search engine. Like NetFlix, Spock has started a contest. First some background on Spock:
Spock is a search application that helps consumers discover more about people who matter in their lives. At the core, we organize relevant information around people and have developed unique technology to do so...With over one hundred million individuals indexed and millions added every day, Spock is the largest and most comprehensive people specific search application.
We have selected one of our most interesting problems, namely Entity Resolution, to share with the community, allowing other leading computer scientists and engineers to compete in an open contest... You can work individually and in teams. The competition will last 4 months and the winning team will win a Grand Prize of $50,000! Most importantly you’ll be working on a very important and widely applicable problem. We will also be issuing prizes for 2nd and 3rd place.

The dataset is 1.5 GB compressed. Time to dig a little deeper... more soon.

Microsoft Research Asia has released a package of benchmarks for creating and testing machine learning based ranking algorithms called Letor (LEarning TO Rank). Their goal is to create a platform that allows researchers to more easily compare the effectiveness of their ML based ranking systems through the use of a standard set of benchmarks.
Ranking is the central problem for many applications, and using machine learning technologies to learn the ranking function has been a promising research direction. However, the lack of public benchmark datasets (e.g. standard features, relevance judgments, data partitioning, and evaluation metrics) makes the existing work difficult to be compared with each other...We benchmarked several state-of-the-arts ranking models with these features and provide baseline results for future studies.

Found via Fernando Diaz's (a grad student at UMass CIIR) blog post on the topic.

AIRWeb 2007 Papers Announced

Also, for the latest in Web Spam research, the AIRWeb 2007 accepted papers are now online. Search Engine Land has a great article on the topic, with links and descriptions of all the papers, something lacking on the website. One of the primary organizers of AIRWeb is Brian Davison. Brian is presenting a paper on link filtering at the conference, Measuring Similarity to Detect Qualified Links.

