Monday, July 7

Susan Dumais SIGIR 2014 ACM Athena Award Lecture

Sue Dumais
 - Introduced by Marti Hearst

Putting the searcher back into search

The Changing IR landscape
 - from an arcane skill for librarians and computer geeks to the fabric of everyone's lives

 - How should the evaluation metrics be enriched to characterize the diversity of modern information systems?

How far we have come....
 - The web was a nascent thing 20 years ago.  
 - In June '94; there were 2.7k websites (13.5% were .com)
 - Mosaic was one year old
 - Search in 1994: 17th SIGIR;  text categorization, relevance feedback and query expansion
 - TREC was 2.5 years old (several hundred k newswire; federal register)
 - TREC 2 and 3, the first work on learning to rank
 - Size of Lycos debuted (Fuzzy Malden), # web pages - 54k pages (first 128 characters)
   --> 400k pages, to 10s of millions
   --> The rise of infoseek, altavista
 - Behavioral logs: #queries/day: 1.5k

Today, search is everywhere
 - Trillions of webpages
 - Billions of searches and clicks per day
 - A core part of everyday life; the most popular activity on the web. 
 - We should be proud, but... search can be so much more.

Search still fails 20-25% of the time.  And you often invest way more effort than you should. Once you find an item, there is no opportunity to do anything with it. 
- Requires both great results and great experiences (understanding users and whether they are satisfied)

Where are the Searchers in Search?
 - A search box to results
 - But, queries don't fall from the sky in an IID fashion; they come from people trying to perform tasks at a particular place and time. 
 - Search is not the end goal; people search because they are trying to accomplish something. 

 - Cranfield style test collections
 - "A ruthless abstraction of the user .."
 - There is still tremendous variability across topics. 
 - What's missing?
  --> Characterization of queries/tasks
       -- How are they selected?  How can we generalize to?
  --> We do not tend to have searcher-centered metrics
  --> Rich models of searchers
  --> Presentation and interaction

Evaluation search systems

Kinds of behavioral data
 - Lab studies  (detailed instrumentation and interaction)
 - Panel studies (in the wild; 100s to 1000s; special search client)
 - Log studies (millions of people; in the wild, unlabeled) - provides what, not why

Observational study
 - look at how people interact with an existing system; build a model of behavior.

Experimental studies
 - compare existing systems; goal: decide if one approach is better than another

Surprises in (Early) web search logs
 - Early log analysis...
  --> Excite logs in 1997, 1999
  --> Silverstein et al. 1998, 2002
  --> web search != library search
  --> Queries are very short, 2.4 words
  --> Lots of people search about sex
  --> Searches are related (sessions)

Queries not equally likely
 - excite 1999; ~2.5 million queries
 - top 250 queries are 10% of the queries
 - almost a million occurred once
 - top 10: sex, yahoo, chat, horoscope, pokemon, hotmail, games, mp3, weather, ebay
 - tail: acm98; win2k compliance; gold coast newspaper

Queries vary of time and task
 - Periodicities, trends, events
 - trends: like Tesla, repeated patterns: pizza on saturday
 - What's relevant to the queries changes over time (World Cup) -- What's going on now!
 - Task/Individual - 60% of queries occur in a session

What can logs tell us?
 - query frequency
 - patterns
 - click behavior

-- Experiments are the life blood of web systems
 - for every imaginable system variation (ranking, snippets, fonts, latency)
 - if I have 100M dollars to spend, what is important?
 - Basic questions:  What do you want to evaluate?  What are the metrics?

Uses of behavioral logs
 - Often surprising insights about how people interact with search systems
 - Suggest experiments

How do you go from 2.4 words to great results? 
 -> Lots of log data driving important features (query suggestion, autocompletion)

What they can't tell us?
 - Behavior can mean many things
 - Difficult to explore new systems

Web search != library search
 - Traditional "information needs" do not describe web searcher behavior
 - Broder 2002 from Alta Vista logs
 - They did a pop up survey in Jun-Nov through 2001

Desktop search != web search
 - desktop search, circa 2000
 - Stuff I've Seen
 - Example searches:  recent email from Fedor that contained a link to his new demo; query: Fedor
 pdf of a SIGIF paper on context and ranking sent a month ago; query: SIGIR
 - Deployed 6 versions of the system
 -> Queries: very short;  Date was by FAR the most common sort order
 -> Seldom do people switch from the default, but they did from best match to Date; the information from James; people remember a rough time. 
 -> People didn't care about they type of file, they cared that it was an image. 
 -> More re-finding than finding, more metadata than best match driven
 -> People remember attributes, seldom the details, only the general topic
 --> Rich client-side interface; every time we go int an are they have characteristics that are very different from other generations of search

Contextual Retrieval
 - One size does not it all
  --> SIGIR  (who's asking, where are they, what they have done in the past)
 - Queries are difficult to interpret in isolation
 - SIGIR - information retrieval vs. special inspector general for iraq reconstruction
 - A single ranking severely limits the potential because different people have different notions of relevance

Potential for Personalization
 - Framework to quantify the variation of relevance for the same query across individuals (Teevan et al., ToCHI 2010)
 - Regardless of how you measure it, there is tremendous potential; it varies widely across different queries
 - 46% potential increase in search ranking
 - 70% if we taken into account individual notions of relevance
 - Need to model individuals in different ways

Personal navigation
 - Teevan et al. SIGIR 2007, Tyler and Teevan WSDM 2010
 - Re-finding in web search; 33% are queries you've issued before
 - 39% of clicks are things they've visited before
 - "Personal" navigation queries
 --> 15% of queries 
 --> simple 12 line algorithm
 --> If you issued a query and clicked on on only one link twice, you are 95% likely to do it again
 - Resulted in online A/B experiments (successfully)

Adaptive Ranking
Bennett et al. SIGIR 2012
 - Queries don't occur in isolation
 - 60% of sessions contain multiple queries
 - 50% of the time occur in sessions that last 30+ mins (infrequent, but important)
 - 15% of tasks continue across sessions

User Model
 - specific queries, URLs, topic distributions
 - Session (short) +25%
 - Historic (long) +45%
 - combinations  - 65-75%

- By third query in a session, just pay attention to what is happening now. 

 - We have complementary methods to understand and model searchers
 - Especially important in new search domains and in accommodating the variability we see across people and tasks in the real world

 - More and more importance of spatial-temporal context (here now)
 - Richer representations and dialogs
  --> e.g. knowledge graphs
 - More proactive search (especially in mobile)
 - Tighter coupling of digital and physical worlds 
 - Computational platforms that couple human and algorithmic components
 - If search doesn't work for people, it doesn't work; Let's make sure it does!!!

We need to extend our evaluation methodologies to handle the diversity of searchers, tasks, and interactivity.

Disclaimer: The views here expressed are purely mine and do not reflect those of Google in any way.


  1. Thank you so much for sharing a lot of this good content! I am looking forward to seeing more!
    Linder Surveying

  2. Some website are providing amazon gift codes and some are selling and we have to choose the best one so that we can save money.

  3. Anonymous9:16 AM EDT

    if you are searching free itunes code generator now you can click here


  4. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in us.

    SAP Training in Chennai

  5. And money. Thank God for instant loans! It is usually fast loans difficult to approach
    happy wheels | friv | girlsgogames | games2girls | happy wheels

  6. Really unbelievable, is not as easy as you can see, the posts difficulty and challenge waiting for you and that's the main difficulties that make us interesting.
    Unblocked Games, friv 2 games, abcya

  7. The blog or and best that is extremely useful to keep I can share the ideas. Age Of War 2
    Big Farm | Slitherio | Tank Trouble
    Of the future as this is really what I was looking for, I am very comfortable and pleased to come here. Thank you very much.
    Happy Wheels | Goodgeme Empire |

  8. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.

    Discover our website bounty of free online games now!
    Our website has the biggest collection of free online games. Totally new games are added every day!

    age of war 2
    gold Miner 2
    unfair Mario 2
    cubefield 2
    tanki Online 2

  9. This comment has been removed by the author.

  10. The blog was absolutely fantastic! Lot of great information which was helpful
    Dot net Training institute in velachery

  11. for beginners like me need a lot of reading and searching for information on various blogs. and articles that you share a very nice and inspires me .

  12. This is a very good article material and it is very useful for us all. thank you . cara menggugurkan kandungan

  13. Nice one, Thanks for sharing your valuable information, which helps lot and enjoy to read your article, keep rocks.
    Dot Net Training in chennai | Dot Net Training institute in velachery

  14. Thanx for sharing such useful post Do My Matlab Homework keep it up :)

  15. I’m really impressed with your article, online BBA dissertation writing service such great & usefull knowledge you mentioned here

  16. Truly fantastic post,you have any kind of confusions related to Microsoft office excel and want a support, you can visit this site ms office product keys support keep posting

  17. jordan daniel6:31 AM EST

    I like the above shared article very much, thanks for sharing.Gmail Support Number

  18. Excellent Post, thanks for sharing.
    farsiha | tekrariha

  19. Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.

    Best Linux training in Noida
    Linux Training Institute in Noida
    Shell Scripting Training Institute in Noida

  20. Few people know this site, rorty popular online casinos and I know it will not deceive you for sure. Withdraw money whenever you want, play different gambling and slots Cool site and beautiful design.