Thursday, July 23

SIGIR Evaluation Workshop: Richer Theories, richer experiments by Stephen Robertson

Richer Theories, richer experiments by Stephen Robertson
- Stepehen Robertson

- Search as a science
- The role of experiment and empirical data gathering in IR
- The standoff between the Cranfield tradition and user-oriented work
- The role of theory in IR
- Abstraction

A caricature
- One the one hand we have Cranfield tradition of experimental evaluation in IR (powerful lab paradigm, but of limited scope)
- On the other hand, we have observational studies with real users (realistic, but of limited scale)

Experiment in IR
- The Cranfield method was initially only about "which system is best", system in this case meaning the complete package (language, indexing rules, indexing, esarch, etc...)
- Cranfield was not seen as being about theor or models.

Implicit models
Cranfield 1: effectiveness is a consequence of the general approach e.g. 'faceted classification, 'Uniterms'

Cranfield 2: effectiveness is a consequence of the combination of low-level components
e.g. synonyms, morpholoical varians, and generic and specific terms are conflated.
- Relevance is a way of measuring effectiveness

Theory and experiment in IR
"Theorys and models in IR (1977)
- Cranfield gave us an experimental view of what we are trying to do.
- The measurement is an explicit component of the models
- We have pursued this ever since...

* Traditional probabilistic models
- explicit hidden variable which is relevance
- prediction of this variable

* Logical models
- relevance embedded in 'd impies q'

* Language models
- Simple model: relevance embedded in 'd implies q'
- extension: relevance itself as lanuage model

* Others
- DFR: again an embedded notion
- Focal of all these models is predicting relevance (or what the model takes to be the basis for relevance)

No other hypothesis/predictions sought. nor other tests made. This is a very limited view.

Scientific method : a brief overview

Traditional science
- image of science involves experiments in laboraties, but this is misleading. This is true in some fields (small-scale physics, but others are not: biochemical end of biology)

- Lab experiments involve abstraction
- choice of variables included/excluded
- control on variables
- restrictions on values/ranges of variables
(parts of the word are included/excluded)

Models and theories involve abstraction
- Why? To make them possible
- Why else: reduce noise, clarify relationships, study simple cases.

Newton's laws
- have many uses (motion of planets)
- there are many ways to test them
- and they suggest other experimental measurements (mass, acceleration due to gravity, and G the gravitational constant)

Abstraction in Newton's laws
Abstraction allows the unification of astronomy and local physics
.. and also the separation of use, testing and..

Testing Newton's laws
- pendulums vs. a toy which also tests it, but has less use.

Information retrieval phenomena
* people writing documents
*users needing information
- to solve some problem or accomplish some task
* these users undertakin information-seeking tasks
* various mechanisms to help them
* a notion of degrees of success or failure

Science and engineering
As IR engineers we concentrate on
- constructing the mechanims
- measureing the success or failure
-- As scientists we should be looking further

A typical SIGIR paper
1) construct a model
2) Ask the model how to do ranking for a search
3) Construct a system which follows the advice of the model
4) choose a baseline
5) evaluation usin TREC data
6) It does better than the baseline...
... therefore the approach/system/model is good.

Traditional IR Evaluation
- Primarily concerned with evaluating systems rather than models or theory, but has become the usual way to evaluate models or theories
- evaluating in terms of useful outcomes (despite the above)
-- There are some disconnects here.

User-oriented Research
* A lot of observational work, but also increasingly laboratory experiments
- within and outwith the Cranfield/TREC tradition

* Emphasis of the models and theories
- understanding user behavior.

Some points of contact
- We are interested in the interaction of mechanisms and user behavior
* understanding the abstraction that is relevance
* understanding easily observable behaviors
- clicks, reformulations, termnations

Theory and models
- One way to better understanding is better models.
- The purpose of models is to make predictions, but what do we want to predict? (useful applications/ inform us about the model)

Predictions in IR
1) What predictions would be useful?
- Relevance, yes... but others!
- redundancy, novelty, divserity
- optimal thresholds
- satisfaction (and other kinds of quality judgments)
- clicks
- search termination
- query modification (and most other aspects of user behavior)
- satisfactory termnation
- abandonment/unsatisfactory termination
... and other combinations.

2) What predictions would inform us about models?
- more difficult: it depends on the models. (many models insufficiently ambitious)
- in general, observables/testables
- calibrated probabilities of relevance
- hard queries
- clicks, termination
- patterns of click behavior
- query modification

Richer models, richer experiments
* Why develop richer models?
- because we want richer understanding of the phenomena (as well as other predictions)
* Why does it ...
A rich theory should have something to say both to the lab experiments in the Cranfield tradition and observational experiments.

- Justin: the history of science can be thought of as the evolution of measuring instruments... we need to know the engineering is correct in order to measure the theory.
- As soon as observational data became stronger (polemic vs. newtonian physics)... in our case, the data is being abstracted. (e.g. collections with NLP markup, contextual information, etc...).. we're still in the epicycle phase.
- Should we be engineering tools to solve a task instead of the engine for everything?

No comments:

Post a Comment