Norbert Fuhr presented the Salton Award keynote speech.
James Allan presented Norbert Fuhr with the 10th Salton award.
He published IR paper in 1984, in Cambridge England. The paper was 19 pages long. Since then, he has authored over 200 papers.
- foreshadowing learning ranking functions.
- probablistic retrieval models
- retrieval models for interactive retrieval
"Information Retrieval as Engineering Science"
[We have to listen to the old guys, but we don't have to accept it, but this doesn't hold for my talk today]
What is IR?
- IR is about vagueness and imprecision in information systems
- User is not able to precisely specify the object he is looking for
--> "I am looking for a high-end Android smartphone at a reasonable price"
- Typically, interative retrieval process.
- IR is not restricted to unstructured media
- the person's knowledge about the objets in the database is incomplete / imprecise
-> limitations in the representation
-> imprecise object attributes (unreliable metadata, e.g. availability)
IR vs Databases
-> DB: given a query, find objects o with o->q
-> IR given a query q, find documents d with high values of P9d->q)
-> DB is a special case of IR! (in a certain sense)
Foundation of DBs
-> Codd's paper on a relational model was classified as Information Retrieval
-> The concept of transactions separated the fields.
Fundamental differences between IR and DB is handling the pragmatic level.
DB: User interactions with the application --> DBMs --> DB
IR: User interacts with the IR system -> over the collection
(separation between the management system and the application)
What IR could learn from DB systems
Multiple steps of inference "joins" a->b, b->c
-> join, links over documents
-> combine the knowledge across multiple documents
Expressive query language
-> specify the inference scheme
-> specify documents parts/aggregates to be retrieved
Data types and vague predicates
-> not every string is text: times, dates, locations, amounts, person/product names. [entities]
-> provide data type-specific comparison predicates (<, set comparison, etc..)
IR as Engineering Science
- Most of us feel our that we are engineers. But, things are not as simple as they might seem.
-> Example: An IR person in civil engineering.
4 or 5 types of bridges - Boolean bridge, vector space, language modeling, etc..
-- build all 5 and see which one stands up.
-- Test the variants in live search
-- Users in IR are blame themselves when they drive over a collapsing (non-optimal) system
-- There could be serious implications by choosing a non-optimal system.
Why we need IR engineering
-> IR is more than web search
Instiutions and companies have
- large varieties of collections
- board range of search tasks
[example: searching in the medical domain. A doctor performs a search, and then waits 30 minutes for an answer. We could return as engineers work on getting this down to 10 min)
Limitations of current knowledge
- Moving in Terra Incognita
- example: Africa. Knowledge of the western world about the african geopgrahy several hundred years ago; the map of it was very innacurate and incomplete.
- At best, interpolation is reasonable.
- Extrapolation lacks theoretic foundation
-> But how to define the boundaries of current knowledge?
-> Probability Ranking Principle
-> Relevance oriented probabilistic models
-> IR as uncertain inference
-> Language Models
Value of Theoretic Models
-> Deeper insight (scientific interest)
-> General validity as basis for broad range of applications
-> Make better predictions (engineer's view)
We should put more focus on the development of theoretic models.
-> each theory is application within a well-defined application range
But, what is the application range?
-> defined by the underlying assumptions
-> Are the underlying assumptions of the model valid? For this, we need experiments to validate them.
- Why vs How experiments
Why -> based on a solid theoretical model.
-> performed for validating the model assumptions
- based on some ad-hoc model
- focus on the outcome
- no interest in the underlying assumptions
-> Improvements that Don't Add Up: Ad Hoc retrieval results since 1998.
-> Trec-8 adhoc collection, MAP
-> It's easy to show improvements, but few beat the best official TREC result.
-> Over 90% of the paper claim improvements that exist due to poor baselines, but do not beat the best TREC results.
-> Improvements don't add up.
Limitations of Emperical Approaches
-> Is computer science truly scientific? CACM 7/2010
Theoertical vs Experimental
- explanatory power
- basis for a variety of approaches
- long standing
- Good results on some collections
- potential for some further improvements (in limited settings)
- short lived
- Ex: Binary Independence Retrieval model
-> terms are distributed independently in the relevant and irrelevant documents
-> did anyone ever check this?
Looking under the hood
-> TF-IDF term weights in probablistic notion. P(R) for a class of terms.
-> Plots of relevance vs tf and IDF for trec adhoc and INEX IEEE
Towards evidence based IR
-> A large variety of test collections
-> large number of controlled variables
IR Engineering: How are results affected by these components?
-> language, length, collection size, vocabulary size, domain, genre, structure
-> length, linguistic structure, application domain, user expertise
What other variables are also important?
Even assuming these are the important variables, we have a high parameter search space.
A plug for evaluatIR.org -> supporting Benchmarking and meta-studies.
Grand IR Theory vs Empirical Science
-> Theory alone will not due.
Foundations of IR Engineering
-> Base layer: theory. Then evidence, and we build the bridge on top of that.
IR Research Supporting Engineering
1) Theory. Proofs instead of empirics + heuristics
- Experiments for validating underlying assumptions
2) Broad Empirical Evidence
- Strict controlled experimental conditions
- Repeat experiments with other colletions / tasks
- variables affecting performance
New IR Applications
- Dozens of IR applications (see the SWIRL 2012 workshop)
- Heuristic approaches are valuable for starting and comparison, but they are limited in the generality.
-> We don't know how far we can generalize the method.
Conclusion: Possible Steps
-> Encourage theoretic research of the 'why' type, e.g. having a separate conference track for these papers.
-> Define and enforce rigid evaluation standards to be able to perform metastudies
-> Setup repositories for standardized benchmarks.
Nick Belkin -> Where do the assumptions underlying the theory come from? Where do we get evidence? How would you approach that?
-> A: Without any hypothesis, the observations are useless. We need a back and forth between theory and experimentation.
DB and IR
-> Can they be united? DB is part of IR. IR is part of DB. [the issues is bringing the people together]
-> Have we hit the limit of our engineering capability? What are the biggest opportunities for significant progress?
A: We perhaps cannot improve the classical adhoc setting. We need to know more about the user and their task. Smartphone example: your phone knows where are you, what time it is, looking for a Chinese restaurant (including opening hours). We need to study the hard tasks for knowledge workers that integrate more deeply in their application.