Thursday, November 12

Machine Learning Talk: Lee Spector on Genetic Programming; applications to Learning Ranking Functions

Today at the Yahoo! sponsored machine learning lunch, Lee Spector presented his work on genetic programming. His talk, Expressive Languages For Evolved Programs highlighted his work using the Push programming language for solving interesting and hard real-world problems.

He pointed to two key principles that these systems need to have to learn solutions, based on observations from biology:
  • Meaningful variation - Variations can't just be random, the mutations and selections have to produce meaningful effects in the domain.

  • Heritability - children need the ability to inherit desirable features from the parent without being clones.
During the talk, a really obvious application would be to use GP to learn IR ranking functions. Recently, Ronan Cummins, did some work in this area. Ronan's recent paper at SIGIR 2009 applied it to learning proximity functions, Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval.

I think there's still interesting work combining GP with IR. For example, one problem is that collections and users evolve over time, but most ranking functions are static.


  1. 1. I think there was a GA algorithm at TREC a few years ago.

    2. Be careful with biological metaphor. Evolution—both real and artificial—has resulted in many suboptimal solutions.

    3. Many (popular) models are usually static. This does not mean that their parameters cannot be adapted (cf. online learning) or their predictions adjusted (cf. Bayesian IR). If you add work on RF/IF, the system becomes very dynamic.

    a. Online learning for IR

    Many though probably the most recent is from the Cornell group,

    [1] Kleinberg, R., Radlinski, F., and Joachims, T. Learning diverse rankings with multi-armed bandits. In International Conference on Machine Learning (ICML) (2008).
    [2] Yue, Y., and Joachims, T. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning (New York, NY, USA, 2009), ACM, pp. 1201–1208.

    b. Bayesian IR

    This my current favorite reference on the subject,

    [3] Lenk, P. J., and Floyd, B. D. Dynamically updating relevance judgements in probabilistic information systems via users’ feedback. Management Science 34, 12 (December 1988), 1450–1459.

    Lenk and Floyd cast the problem in a way which is very appealing for folks working with ten blue links.

  2. One huge criticism of GP is training time. I don't have any first-hand experience, but they're generally considered too slow for most reasonably sized problems.

    There was a GP paper at one of the LETOR workshops a couple years ago:
    "Learning to Rank for Information Retrieval Using Genetic Programming"
    by Yeh et al.
    (apologies for the weird link redirect through Google sites. Couldn't find another version)

  3. I think GP has its merits for IR, and can actually achieve reasonably good results (at least as far as I can see from papers proposed by Jeff and Jon).

    I like two facts about GP: they are intrinsically easy to parallelize and they are generally well suited for unconvex functions (I only have Wikipedia "Genetic algorithm" article to back me up on this, so please correct me if I am wrong :) )

    The thing that's not so nice about GP, is that its quite hard to understand what the solution is really doing, as is demonstrated by the final scoring function in Ronan's paper.

  4. GPs deal with nonconvexity by naively applying stochasticity. There are smarter ways of dealing with nonconvexity.