Yahoo! has announced a Learning to Rank challenge as part of the Learning to Rank Workshop at ICML 2010.
They are releasing (to participants) two large real-world datasets. The first dataset has:
For details on the second set, see the website.
The URLs are rated on a graded scale, 0 (irrelevant) to 5 (perfect). The evaluation will use Normalized Discounted Cumulative Gain (NDCG) and Expected Reciprocal Rank (ERR).
The set only includes query and URL identifiers without the original information, so engineering new features seems unlikely.
The competition begins March 1st and goes through May 31st.