Traditional search evaluation has focused on the relevance of the results, and of course that is our highest priority as well. But today's search-engine users expect more than just relevance. Are the results fresh and timely? Are they from authoritative sources? Are they comprehensive? Are they free of spam? Are their titles and snippets descriptive enough? Do they include additional UI elements a user might find helpful for the query (maps, images, query suggestions, etc.)? Our evaluations attempt to cover each of these dimensions where appropriate.One of my biggest issues with TREC and similar environments is the single focus on topical based relevance. See my previous post on the TREC blog track. For example, a spam post that is relevant to a topic would be acceptable, even if you would never want to read it in real life. It's time we move beyond the basics and find ways to tackle the more challenging retrieval quality aspects in a way that is still amenable to cost effective measurement.
Note: I also highly recommend What People Think About When Searching by Daniel Russell who analyzes user intent and behavior at Google.