I appreciate Erik's in-depth responses to the questions. In the process he shares some wisdom from his grad student days at UW. In particular, how the research focus of MetaCrawler evolved. For example, the fact that the mechanics of distributed querying wasn't an interesting research problem,
However, that tool could be used to collect a large number of web pages about a topic from “knowledgeable sources” and thus we could do something to analyze semantic structure. However, this wasn’t terribly well defined, and by the time we had MetaCrawler, we still weren’t sure what structure we’d want to investigate and even what kinds of semantics we were interested in. So, that part of the project was dropped, and we focused more on the research of MetaCrawler itself.Things don't work out as planned, but good researchers adapt and shift focus. One last nugget of wisdom for researchers from the interview:
Oren’s advice on the matter was to always investigate surprises with great vigor. Predictable things are, well, predictable, and the research that comes from steady improvement, while beneficial, tends to be rather boring. However, when you discover something that was unexpected, the results and explanations are almost always exciting and fascinating.I can't help but notice two connections of meta-search to current search engines.
- The decision to perform 'deep web surfacing' rather than federating results from third-party data sources. For example, Google has starting crawling the data behind forms. See the recent paper, Google's Deep-Web Crawl.
- The rise of "Universal Search", the process of blending results from multiple vertical search indices, is an interesting application of meta-search. Is there research that focused on the unique challenges of this use case? Considering the importance to industry, it's surprising to see the dirth of recent work in this area.