Ellen Voorhees defends Cranfield.
At ECIR Nick Belkin and Amit Singhal both highlighted limitations of the Cranfield evaluation methodology. (For more on Cranfield see Ellen's description from 2005). This is the methodology used at TREC, and by most people in the research community.
Here's a recap of the limitations outlined at ECIR:
- The pooling evaluation system means it is biased against revolutionary new methods that do not returned pooled documents
- Documents and queries evolve rapidly over time and these changes are not been modeled in static test collections and query sets
- In the real world, Cranfield style evaluations are incredibly expensive and always out of date
- Doesn't easily allow for interactive sessions; i.e. there is no 'conversation' between the search engine and the user
- It is far removed from the real users' environments and search tasks
Real search engines begin by looking at usage data and running tests on a fraction of users, but that's not something that academic researchers can reproduce.