Steve talked about TREC and the fact that the test collections are static. He talked about the fact that real-world collections evolve. A potentially interesting avenue for research are dynamic test collections. It is interesting to model how often documents change and are created and the impact this has on precision and recall. Another interesting problem is that very recent documents don't have the same links or link text that older documents have. How should this be modeled for relevance?
He also talked about personalization in search and the use of Minion for content-based recommendation systems.
Personalization in search is a hot topic right now. How long are queries useful? What interests have local temporality (i.e. researching a trip) vs. longer term preference (say software engineering and cooking).
Steve spent quite a bit of time talking about collaborative filtering and recommendation engines. One memorable quote was, "recommendation is the new search." He talked about using Minion for the Aura recommendation engine. Paul Lamere used Minion to perform content-based similarity using the tags from Last.fm for a music collection. Their system using Minion was the best in their test.
One of the questions at the end was about Minion vs. Lucene. Steve has written about this on his blog, but I found his brief answers informative:
- Support for data types beyond String that enable parameter query operations on fields, for example date and numeric values
- Minion has a English morphology engine to generate different word forms for query expansion out of the box.
- Minion has a run-time configuration system configured with XML files, Lucene is configured with code.
After listening to the video, I downloaded the Minion source code and started poking around. I encountered a few minor hiccups. I couldn't find developer documentation, so I just went for it. First, I use Eclipse and the development team appears to use Netbeans, so I think I am encountering some platform issues. The Ant build script failed because of lack of JUnit on the classpath. For the normal Eclipse build, it is failing because the JavaCC generated parser classes are not present because they are built by the Ant script. I managed to get the Ant script to work and build a jar file so that the rest of the project compiled.
One thing that would be really useful is a good set of examples. How does the XML run-time configuration system work? Does Minion support document boosting similar to Lucene?
When I get a bit more time I'll give it more of a shot on some data I have lying around.