Mahout 0.2 has several key new features that are worth taking a look at:
- Latent Dirichlet Allocation (LDA) (JIRA information) - LDA is a form bayesian topic modeling; a type of clustering that discovers hidden "topics" from a collection of documents. See the original LDA paper by Blei, et. al. and the Mallet LDA implementation from UMass (not MapReduce).
- K Nearest Neighbor (KNN) and Singular Value Decomposition (SVD) based recommender - JIRA. The implementation is based on a paper by the Netflix prize winning team: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
- Random Forests -(JIRA) An ensemble classifier. See Breiman's description for background on Random Forests.
- Frequent Itemset Pattern Miner - (JIRA) - An algorithm that analyzes co-occurrence of items in a basket to suggest new items. This is an implementation of the Parallel FP Growth algorithm.