Thursday, November 4

Susan Dumais CIKM 2010 Keynote: Temporal Dynamics in Information Retrieval

I am still catching up on a backlog of items from last week.

Here are more of Michael's notes from Susan Dumais' keynote presentation at CIKM 2010 that addressed the impact of time on web search. Gene also has his notes from the presentation.
  • Change in IR
    • New documents and queries
    • Query volume changes seasonally/periodically
    • Document content changes over time
    • User interactions change over time (e.g., anchor text, page visits)
    • Relevant document for query change over time, “Hurricane Earl” (Sept. 2010 vs. before/after)
    • But -> evaluation corpora is usually static

  • Digital dynamics are relatively easy to capture, however tools for interacting with information are static (Browsers/search engines)

  • Characteristics of Web page change
    • Measuring web page change in a large web crawl
    • 33% of web pages changed over a period of 11 weeks
    • 66% of visited pages changed over 5 weeks, 63 changed every hr
    • Avg. time between changes – 123 hr.
    • .com pages change more often than .gov,.org
    • Knot point – the place on the change curve where the page stabilizes over time; Characterizes the way pages change
    • Term-level changes
      • Looking at characteristic term for the page and their “staying power”, e.g. “cookbooks” & “ingredients” have a high staying power for allrecipes.com, “barbeque” is more transient

  • Revisitation Patterns on the Web
    • 60-80% of the pages you visited, you’ve already seen before
    • 4 revisit patterns:
      • Fast - Navigation within site
      • Hybrid - High quality fast pages
      • Medium - Popular homepages/mail & web applications
      • Slow - Entry pages, bank pages, accessed via search engines

  • Revisitations & Search (Teevan et al, SIGIR 2007, Tyler et al., WSDM 2010)
    • Repeat query 33%
    • Repeat click 39%

  • Relationships between revisits and change (Adar et al., CHI 2009)
    • Monitor change
    • Effect change is not related to change
    • Change can interfere with re-finding
    • The more visitors the page has, the more often it changes
    • Three pages: nytimes.com, woot.com, costco.com
      • Similar change patterns, but different revisit patterns:
      • NYT – fast revisit
      • Woot – medium revisit
      • Costco – slow revisit

  • Diff-IE – Building support for understanding change
    • Browser toolbar that highlights content that was changed since the last visit
    • Non-intrusive and personalized --- changes that are of interest to you, not to the publisher of the page
    • Helps to uncover unexpected important content
    • Facilitates serendipitous encounters
    • Helps to understand page dynamics
    • Will be publicly available later this month from
    • http://research.microsoft.com/en-us/projects/diffie/default.aspx
    • Research surveys show that Diff-IE drives more revisitation
      • Driving visits to pages that change frequently

  • Leveraging Temporal Dynamics for IR (Elsas & Dumais, WSDM 2010)
    • Use document change rate to set document priors
    • Use term longevity to weight terms
    • Evaluation using static data
      • Using 2k navigational queries
      • Dynamic model outperforms the static baseline

    • Ongoing evaluation collection (Understanding Temporal Query Dynamics, to appear in WSDM 2011)
      • Collect relevance judgments over time, e.g. “march madness” query
      • Document relevance changes over time

1 comment: