I am not at CIKM, but Michael Bendersky sent me his notes from Jamie Callan's keynote address. Gene also gave his writeup on the FXPal Blog.
Jamie Callan: Search Engine Support for Software Applications- Motivation: SE (search engine) as a "language DB"
- Computer Assisted Language Learning
- Q&A
- Read-the-Web
- IR typically assumes a "user" is a person
- Software applications are a new challenging class of SE users
- There are very low expectations from a SE from an application "user" perspective
- E.g., SE's are mostly used for keyword search
- Recall-Precision tradeoff avoids SE's from using a highly structured query language (like Indri)
- BOW query - high recall/low precision
- Structured query - low recall/high precision
- Motivation II: using rich language/information resources
- Wordnet, Freebase, Dbpedia, ...
- SE's are not very good at using them
- Structured queries and documents are well-studied IR topics, but
- Do we really understand them?
- Maybe the basic structures, but not the more advanced ones
- Document = structured object
- Metadata:
- Fielded text: title, chapters, sections, references
- Relations to other documents
- Example application: REAP Project: Computer Assisted Language Learning
- Find interesting documents/passages for students based on their language level
- Use a structured Indri query language to find relevant documents or document parts
- A typical approach to fields
- Exact Boolean match on the attributes
- Can be brittle.
- Another type of document structure
- Text annotations in documents (POS, semantic labeling, co-referencing)
- Annotations can be considered to be "small fields"
- Problems with retrieval with text annotations
- Annotations are not always 100% accurate / ambiguous
- Missing annotations
- Wrong annotation boundaries
- Conflated annotations: white/JJ house/NN should be white/NP house/NP
- Missing annotations
- Term weighting in short fields is hard - need to take field length normalization into account.
- Problem of multiple matches: combining evidence from different fields from the same type is not a solved problem.
- Annotations are not always 100% accurate / ambiguous
- Relations among documents/entities
- Hyperlinks & RDF
- XML
- Relational Retrieval (Lao & Cohen 2010)
- Example for use: journal recommendations, expert finding
- Some parts of metadata are "domain knowledge" --- they really reside outside the documents.
- How to model domain knowledge as an integral part of the documents
- Have different types of documents: paper, journal, authors...
- Have typed relations between the documents: transcribes, appears in, ...
- Have an Indri-like query language to match documents and relations
- Example for use: journal recommendations, expert finding
- Inferred knowledge: Read-the-Web project
- How to integrate the accumulated knowledge in SE's
- Entity search is one example
- General purpose solutions are still in progress.
- How to integrate the accumulated knowledge in SE's
0 comments:
Post a Comment