Tuesday, July 26

SIGIR 2011 Best Paper Award

The SIGIR 2011 best paper awards were announced.

The winner is:
M. Ageev, Q. Guo, D. Lagun, and E. Agichtein

Honorable mention goes to:
Kevin Haas, Peter Mika, Paul Tarjan and Roi Blanco

See also the notes from the SIGIR 2011 keynote addresses by Qi Lu and ChenXiang Zhai.

SIGIR 2011 Keynote ChenXiang Zhai: Beyond Search: Statistical topic models for text analysis

ChengXiang Zhai gave the second keynote address at SIGIR 2011 held this week in Beijing.

Here are the notes from my friend and fellow UMass grad student Michael Bendersky (follow him on @bemikelive). Also, be sure to check out his workshop on Query Representation and Understanding.

Be sure to read Michael's notes from Qi Lu's first keynote talk on the Future of the Web & Search.

Beyond Search: Statistical topic models for text analysis
  • Complex Task Completion Flow
    - Multiple Searches → Information Synthesis & Analysis → Task Completion
    - Sometimes the process above is iterative

    Examples of complex tasks
    • What laptop to buy?
    • What’s hot in database research?
    • What do people say in blogs on a certain topics? How does the topic coverage change over time?
    • What people like/dislike about “Da Vinci Code”?

  • Can we model complex tasks in a general way?
  • Can we solve them in a unified framework?
  • How do we bring users into the loop?

  • Proposed solution – Statistical Topic Models
    - Generative model
    - Captures language models shifts based on topics
    - Language model serves as a convenient topic representation
    - Every document has a lot of contextual data (metadata)
    o Author
    o Communities
    o Location
    o Author’s occupation
    o User labels
  • Any combination of contextual data can induce partition over the documents

  • We should make topics depend on context variables
    o Text is generated from a contextualized PLSA model
    o Fitting such a model enables a wide range of analysis tasks on a document

  • Applications of contextual topic models
    o Social Network Analysis can aid to derive more coherent topic models
    o Opinion mining – integration of expert reviews and personal opinions
    • Take into account the well-formed and faceted design of expert reviews to impose context on personal opinions, which come from a variety of unstructured sources (blogs, micro-blogs, review sites, comments)
    • Derive integrated expert/personal opinions on different aspects
    • Infer aspect ratings and weights

  • Using topic models to go from search engine to analysis engine
    o Tasks
    • What is a task?
    • How is task different from information need/intent?
    • How do we help users to express tasks
    o What does ranking mean in analysis engine?
    o How to evaluate the output of the analysis engine?
    o Operators to allow analysis of search results
    -- Select, Split, Intersection/Union, Interpret, Rank, Compare
    • Operators can be combined, similar to SQL/InQuery languages

SIGIR 2011 Keynote Talk: Qi Lu and The Future of the Web & Search

Qi Lu, the president of Microsoft's Online Services Division gave the first keynote address at SIGIR 2011 happening this week in Beijing. He laid out Microsoft's vision for the future.

I am in San Francisco at Twitter, but luckily my friend and fellow UMass grad student Michael Bendersky is taking notes (follow him on @bemikelive). Also, be sure to check out his workshop on Query Representation and Understanding.

Future of the Web & Search
  • Agenda
    - Perspective of the web/IT industry
    - Future of search
    - Role of IR
    - Challenges
    - Opportunity

  • The heritage: web of documents
    The future:
    - Social web - Facebook profiles, like buttons
    - Geospatial web: Mobile devices
    - Temporal web: Collection of information over time, real-time microblogging
    - Application web: Fundamental design of the browser doesn’t support new application models

  • IT industry of the future
    - Devices + cloud services
    - Changing the user intent capturing from rigid keyboard/mouse/keywords combination to more natural modalities
    • Understanding the natural language
    • Voice recognition
    - On mobile devices
    - In living room products
    • Body gestures - Microsoft Kinect
    • Image/Audio/Video capturing

  • Vision: of the future of search
    o Empower people with knowledge
    o Re-organize the web for search to unlock the full potential of the web
    • Better discovery
    • More informed decisions
    • Easier task completions

  • Role of IR
    o Understanding user intent
    o Modeling web of the world
    • People/places/things
    • Relations
    o Task completion & decision making
    o Incentive engineering for making people do more things on the web

  • Challenges
    o Measurement, evaluation & self-correction
    • Some things are inherently hard to evaluation: objectiveness, design, opinions
    • Search results have profound influence on the way people perceive the world
    • It is important that they have no inherent bias or skew

    o Privacy

    o Lack of
    • Tools & understanding in existing disciplines
    • Training & development if cross-disciplinary talent

    o Barriers for academia research
    • Access to data
    • Computing infrastructure
    • Funding
    • Not just based on company agenda
    • Funding projects based on pure creativity

  • Opportunities
    • Opportunities for key breakthroughs in the areas of
    • Serendipitous discovery (e.g. Hunch.com)
    • Information theory for the age of the web and social networks
    • Science of big data

    • Broadening collaborations
    • Research
    • Development (API/tools)
    • Investment (Training & Development)

    • Vibrant community
Follow #sigir2011 for more news, although given the censorship in China, the results are very sparse.