Thursday, April 29

Danah Boyd WWW keynote: Privacy and Publicity in Big Data

Today Danah Boyd's gave an address on the Privacy and Publicity in the context of big data at WWW 2010 . Danah released a crib sheet summary on her website, which you should read. Here are Michael's notes from the talk.
  • Privacy concerns are everywhere
    • Big data (Social data created
      by people) magnifies these concerns

  • Data is cheap today
    • Making sense of the data, however, is still hard
    • Accessing and processing it an ethical way is not investigated

  • Methodological issues
    • Big data introduces more
      questions than answers
    • Ethnography tries to answer
      some of these “why” questions

  • Social Sciences Approach
    – 4 key points
    • Bigger is not always better
    • Not all data equal
    • “What” != “Why”
    • Be careful in interpretations

  • Sampling
    • The way you sample affects
      your results – hard to create truly random representative sample
    • Big data doesn’t mean
      “the whole of the data”
    • No matter how many tweets
      you have, your sample is always biased
      • Oversampling users who tweet
        frequently

  • Not all data are equal
    • What does your network represent?
      Types of social network
      • Articulated
      • Behavioral
      • Personal

    • Data from Facebook is not
      necessarily more accurate that other social (smaller) network
      • Facebook friends != person’s social network
      • Frequency of conversation != personal closeness
  • What != Why
    • Correlation does not mean causation
    • Even if your model points that there are two connected events doesn’t mean one causes
      the other
    • Results need to be interpreted
      • Technology can corrupt social
        science research by making simplifying assumptions and ignoring how
        the context in which original results were obtained

    • Uncertainty principle applies
      • Networks are made of people,
        not of abstract nodes on the graph
      • Data in the network is about
        real people’s lives

  • Just because data is accessible
    doesn’t mean that using it is ethical!
    • Privacy is context
    • Walls (Technology) have
      ears (and mouths)

  • Five point for privacy security
    • Security through obscurity
      • Violated more and more by
        technology
      • Technologies change people’s
        behavior
    • Not all is meant to be publicized
      • Do we all want to become
        “digital micro-celebrities” and fear the “digital paparazzi”?
    • PII vs. PEI (Personal Identifiable
      vs. Embarrassing Information)
      • Algorithms have a hard time
        discerning PII & PEI
    • Data out of context is a
      privacy violation
    • Privacy is not access control

  • People care about privacy
    • But they all also care about
      publicity – a right to be in public

  • Facebook
    • Facebook users have an impression
      that “Facebook is more private than MySpace”
    • Newsfeed – publicizing
      implicit (but accessible) content in explicit way
      • Initially controversial,
        became a great success
      • Created a set of norms in
        the “Facebook world”
    • Beacon – people are vessels
      for advertisements
      • Was a failure, ended in
        a user lawsuit
    • New default privacy settings
      • Research shows that people
        do not understand their privacy settings in Facebook
      • In fact, their mental map
        of settings doesn’t match the actual settings
    • Slow changes from private
      to public
      • Users are like frogs who
        are slowly “cooked” and do not realize it
      • Data from 3rd
        party sites is slowly aggregates
        • Tastes, web actions
          are made public
    • Opt-out is the norm at Facebook
      • People do not understand
        what they implicitly agree to

  • Regulations
    • Involvement from governments
      (esp. from Europe,Canada)
    • Researchers --- need to
      understand the consequences of their analysis



No comments:

Post a Comment