Wednesday, October 27

CIKM 2010 Jamie Callan Keynote: Search Engine Support for Software Applications

I am not at CIKM, but Michael Bendersky sent me his notes from Jamie Callan's keynote address. Gene also gave his writeup on the FXPal Blog.

Jamie Callan: Search Engine Support for Software Applications

  • Motivation: SE (search engine) as a "language DB"
    • Computer Assisted Language Learning
    • Q&A
    • Read-the-Web

  • IR typically assumes a "user" is a person

  • Software applications are a new challenging class of SE users

  • There are very low expectations from a SE from an application "user" perspective
    • E.g., SE's are mostly used for keyword search

  • Recall-Precision tradeoff avoids SE's from using a highly structured query language (like Indri)
    • BOW query - high recall/low precision
    • Structured query - low recall/high precision

  • Motivation II: using rich language/information resources
    • Wordnet, Freebase, Dbpedia, ...
    • SE's are not very good at using them

  • Structured queries and documents are well-studied IR topics, but
    • Do we really understand them?
    • Maybe the basic structures, but not the more advanced ones

  • Document = structured object
    • Metadata:
    • Fielded text: title, chapters, sections, references
    • Relations to other documents

  • Example application: REAP Project: Computer Assisted Language Learning
    • Find interesting documents/passages for students based on their language level
    • Use a structured Indri query language to find relevant documents or document parts

  • A typical approach to fields
    • Exact Boolean match on the attributes
    • Can be brittle.

  • Another type of document structure
    • Text annotations in documents (POS, semantic labeling, co-referencing)
    • Annotations can be considered to be "small fields"

  • Problems with retrieval with text annotations
    • Annotations are not always 100% accurate / ambiguous
      • Missing annotations
      • Wrong annotation boundaries
      • Conflated annotations: white/JJ house/NN should be white/NP house/NP

    • Term weighting in short fields is hard - need to take field length normalization into account.

    • Problem of multiple matches: combining evidence from different fields from the same type is not a solved problem.

  • Relations among documents/entities
    • Hyperlinks & RDF
    • XML

  • Relational Retrieval (Lao & Cohen 2010)
    • Example for use: journal recommendations, expert finding
    • Some parts of metadata are "domain knowledge" --- they really reside outside the documents.

    • How to model domain knowledge as an integral part of the documents
      • Have different types of documents: paper, journal, authors...
      • Have typed relations between the documents: transcribes, appears in, ...
      • Have an Indri-like query language to match documents and relations

  • Inferred knowledge: Read-the-Web project
    • How to integrate the accumulated knowledge in SE's
    • Entity search is one example
    • General purpose solutions are still in progress.
More CIKM coverage soon.

Monday, October 25

Ray Ozzie on the future of computing

Ray Ozzie leaving Microsoft as Chief Architect. In a farewell memo, dawn of a new day, he points to the future,
Instead, to cope with the inherent complexity of a world of devices, a world of websites, and a world of apps & personal data that is spread across myriad devices & websites, a simple conceptual model is taking shape that brings it all together. We’re moving toward a world of 1) cloud-based continuous services that connect us all and do our bidding, and 2) appliance-like connected devices enabling us to interact with those cloud-based services....

It’s the dawn of a new day – the sun having now arisen on a world of continuous services and connected devices.

What does this shift imply for search? We are already seeing growth in mobile search. People are searching more because they have the capability. And these searches tend to be more local in nature because people are more often looking for actionable information now.

One possibility is what Eric Schmidt described as autonomous search. In this model the retrieval system is proactive, responding to queries, but also actively notifying the user due to changes in the environment. The might describe such a system is an "intelligent information agent".