Monday, December 28

Dean and Ghemawat Strike Back on MapReduce

Jeff Dean and Sanjay Ghemawat wrote an article for the January edition of CACM, MapReduce: A Flexible Data Processing Tool. In the article, they refute the findings of A Comparison of Approaches to Large-Scale Data Analysis. On their blog, the authors also wrote a post bashing MapReduce: MapReduce, A major step backwards. The post is no longer available, but thankfully Greg had good coverage.

In the article Dean and Ghemawat address the paper and attempt debunk its claims, although they lack the benchmarks to back it up. In the process, they inform you about the right way to run M/R jobs efficiently:
  1. Avoid starting processes for each new job, reuse workers.
  2. Careful data shuffling, avoid O(M*R) disk seeks
  3. Beware of text storage formats.
  4. Use natural indices like timestamps on files.
  5. Do not merge reducer output.
They present some good M/R lessons in their refutation. You should be using a binary serialization system like Avro or Protocol Buffers and storing your data in a format that provides efficient access, using a natural file structure or using a database system like HBase.

NY Times hits new low: Gives voice to quack on "Search Neutrality"

The NY Times ran a op-ed article, Search, but you may not find. I can't believe they ran such rubbish. I'm not going to bother to debunk it, Paul Kedrosky did a better job than I could.

The problem is that commercial search engines are inherently conflicted: they have products to sell and advertisers to please. The question is: Should search be a public service, like a library?

The French are taking on Google books with Polinum, the "Operating Platform for Digital Books." Jimmy Wales's efforts with Wikia Search failed because they didn't execute and weren't profitable. Daniel, a long advocate of transparency in search now works for Google.

There will always be disgruntled quacks, but in the long-run, is a company or even a small group of companies with such a large share of search healthy?

Sunday, December 27

NY Times Article on Childrens Search

The NY Times had a recent article on search for kids. They covered a study sponsored by Google and performed by Allison Druin at the HCI lab at UMd that conducted a user study with 83 kids to understand how they search. My wife is an elementary school teacher, so this a topic we've often discussed and is particularly interesting.

In recent related work, Druin published, How Children Search the Internet with Keyword Interfaces which was performed on 12 kids. Read section 6 for their suggestions on user interfaces. Here are several of their possibilities: (1) using voice search instead of typing, (2) simplified results pages (3) results that are at an appropriate reading level. The NY Times article appears to describe a larger follow-up study.

The NY Times interviewed Irene Au, Google’s Director of User Experience for ways the research could be incorporated into a product. They note that they keyword mismatch problem is much challenging for kids, who have less of the conceptual framework of a subject necessary to be effective. From the article, “The problems that kids have with search are probably the problems adults experience, just magnified... If we can solve that for children we can solve that for adults." However, I'm not convinced that this is a correct conclusion. Druin says that the bottom of the screen is an area that offers an important area to suggest related searches.

In the article, representatives from Bing and also weigh in; a representative from Y! is notably absent given Y!'s presence in this market. Stefan Weitz, from Bing suggests that visual interfaces offer an opportunity because kids haven't developed typing skills. Scott Kim, from says that kids are more likely than adults to ask questions. Perhaps if we catch them early enough, we can study them before they are brainwashed into keywordize.

Given their lack of typing skill, the article briefly mentions that voice search, like that used for mobile search, offers an interface opportunity for kids.

At the end one of the kids interviewed suggests, “I think there should be a program where Google asks kids questions about what they’re searching for,” he said, “like a Google robot.”

I look forward to reading the paper on the study. Hopefully it will contain the concrete solutions to improve the search experience for kids that they foreshadow in their earlier work.