For their description, you can read the announcement on the Y! Sandbox. The demo uses a set of 8 million Q&A documents from Yahoo! Answers collected in 2007. Here's their description of some of the challenges they faced:
The first one is to select the right "lexical units" of the collection in order to produce meaningful browsing suggestions. The next challenge is to develop interesting list suggestions, on the fly, for whatever query the user may submit. Lastly, we had to invent an interface that would allow users to interact with the suggestions and the results, and enable a natural browsing experience.They used the DeSR dependency parser to extract terms and phrases and then use a forward index with Archive4J to count and sort the terms in the questions that are returned by a query.
I tried it for pasta and then filtered to "pasta salad" I was hoping that some of the nouns would include common ingredients: bacon, chicken, olives, onion, pepperoni, mozzarella cheese, etc... However, most of the nouns/verbs are more general and somewhat redundant given my selected filters. I think the algorithm to select the terms could still be improved.
Faceted search interfaces are important browsing tools, and automatically extracting and selecting facets is a challenging problem. It's good to see first steps applying NLP to the task. I look forward to seeing how Quest evolves.
Be sure to check out the Correlator demo if you haven't seen it.
1 comments: