Wednesday, December 8

New Book: Mining of Massive Datasets

Anand Rajaraman and Jeffrey D. Ullman have put together a new ebook, Mining of Massive Datasets. The book builds on the course materials for the Stanford CS345 course "Web Mining" and the CS246 class, Mining Massive Data Sets.

From the ToC, the book covers:
  1. An introduction to data mining
  2. Large-scale processing with distributed file systems and MapReduce
  3. Similarity search: nearest neighbor, minhashing, LSH, etc...
  4. Algorithms for mining streaming data
  5. (Web) Graph analysis: Pagerank, HITS, and spam detection
  6. Frequent Itemset algorithms
  7. Clustering Algorithms
  8. Advertising on the web
  9. Recommendation Systems
It is an interesting blend of material that are not usually taught together. I look forward to examining it in more detail.


  1. I found out excellent and useful information from you. I appreciate your work here. I like this type of article because it will be useful for everyone.
    data room services