Saturday, March 17

Open source collaborative filtering and recommendation systems

Update 3/3/2010: Added Mahout

Yesterday I posted on open source text mining libraries. Today, I am looking at recommendation systems aka collaborative filtering (CF); mining user behavior and harnessing the "wisdom of the crowds." In a nutshell, recommendation systems discover new items you might be interested based on your past preferences (such as explicit ratings or implicit click behavior). Their goal is to bring you new and more importantly, interesting, information without you searching for it.

Background from Amazon
Since we are talking about recommendations, the first stop is Amazon and the creator of its original system, Greg Linden, author of Geeking with Greg. (I had the opportunity to meet Greg at SIGIR this past summer and we had some great discussion during the poster session.) Greg's "Early Amazon" posts really provide fascinating insight into some of Amazon's early days. The Amazon recommendation system started as a side project that he wasn't supposed to be working on, read the full story and don't miss his earlier story on his first attempt at a system, BookMatcher.

Current Systems
Recently, a lot of work on distributed recommendation systems is happening in Apache Mahout, a distributed machine learning library that uses Hadoop. The Taste recommender was incorporated into it. The first version was originally started as work on the NetFlix contest. (via Greg). The Mahout library has support for KNN, SVD, and Frequent Pattern Mining using Parallel FP-Growth. Some of the recommendation algorithms are more mature than others: so you'll be getting your hands dirty getting some of them to work. Despite it lack of maturity, this would be my first stop if I was building a system today.

A simple content based recommender could be built using a search system to take an object and convert it into a query. See the open-source search engines.

Other Related Work
Another specialist in this area is Daniel Lemire, a researcher at the University of Quebec in Montreal. He wrote this paper on a simple and effective recommendation engine using SQL and PHP, the code is available on the site. There is a related project, Vogoo in PHP which appears to be actively maintained. Daniel also wrote a version of the item based recommender engine in Java, Cofi.

CoFE (Collaborative Filtering Engine) is another open source Java based engine created by Jon Herlocker from the University of Oregon, but I don't believe it is being maintained; it looks like it hasn't been updated since 2004.

Ray Mooney at the University of Texas has also been working on recommendation research as well, his main specialty is information extraction and machine learning. Here are some of his department's publications. Specifically, here are some introductory level slides from a recent course he taught on Information Retrieval.

That pretty much covers recommender systems for today. You can always check the Wikipedia article on Collaborative Filtering (CF) for updates. Again, many of these systems use machine learning and classification, which fits nicely with my previous post on text mining.


  1. Great post. (My first name is "Daniel" though, not David).

  2. Sorry about that, it was late :-). Should be fixed now.

    I read your blog regularly, keep up the great work.

  3. Anonymous2:27 AM EDT

    Great blog, i read your blogs regularly. Keep up the good work All the best.

  4. Anonymous12:03 PM EDT

    Actually, Amazon's recommendation engine was based on Net Perception's own software which some genius decided to give to Amazon for next to nothing.