Wednesday, February 24

Yahoo! Learning to Rank Challenge at ICML 2010

Yahoo! has announced a Learning to Rank challenge as part of the Learning to Rank Workshop at ICML 2010.

They are releasing (to participants) two large real-world datasets. The first dataset has:
29,921 queries
744,692 URLs
519 features

For details on the second set, see the website.

The URLs are rated on a graded scale, 0 (irrelevant) to 5 (perfect). The evaluation will use Normalized Discounted Cumulative Gain (NDCG) and Expected Reciprocal Rank (ERR).

The set only includes query and URL identifiers without the original information, so engineering new features seems unlikely.

The competition begins March 1st and goes through May 31st.


  1. What, no efficiency task?

  2. Are folks at UMass submitting something? I'm very tempted.

  3. Ya, efficiency/scalability is an interesting topic.

    I've heard a bit of discussion in the lab, but nothing concrete. Many of us are looking at CIKM papers and don't want to get spread too thin...

    Still, it's a very interesting dataset, so...

  4. interesting dataset which you have to delete by June 30th!

  5. Actually, I have more information now. The dataset will be available on the Yahoo Webscope release program after June 30th. The actual dataset only contains feature vectors, so no URLs or queries.

  6. Jon - I heard that Michael and Niranjan are working on something...

    I'm interested, but I have to focus on other projects at the moment.

