- Privacy concerns are everywhere
- Big data (Social data created
by people) magnifies these concerns - Data is cheap today
- Making sense of the data, however, is still hard
- Accessing and processing it an ethical way is not investigated
- Methodological issues
- Big data introduces more
questions than answers - Ethnography tries to answer
some of these “why” questions - Social Sciences Approach
– 4 key points - Bigger is not always better
- Not all data equal
- “What” != “Why”
- Be careful in interpretations
- Sampling
- The way you sample affects
your results – hard to create truly random representative sample - Big data doesn’t mean
“the whole of the data” - No matter how many tweets
you have, your sample is always biased - Oversampling users who tweet
frequently - Not all data are equal
- What does your network represent?
Types of social network - Articulated
- Behavioral
- Personal
- Data from Facebook is not
necessarily more accurate that other social (smaller) network - Facebook friends != person’s social network
- Frequency of conversation != personal closeness
- What != Why
- Correlation does not mean causation
- Even if your model points that there are two connected events doesn’t mean one causes
the other - Results need to be interpreted
- Technology can corrupt social
science research by making simplifying assumptions and ignoring how
the context in which original results were obtained - Uncertainty principle applies
- Networks are made of people,
not of abstract nodes on the graph - Data in the network is about
real people’s lives - Just because data is accessible
doesn’t mean that using it is ethical! - Privacy is context
- Walls (Technology) have
ears (and mouths) - Five point for privacy security
- Security through obscurity
- Violated more and more by
technology - Technologies change people’s
behavior - Not all is meant to be publicized
- Do we all want to become
“digital micro-celebrities” and fear the “digital paparazzi”? - PII vs. PEI (Personal Identifiable
vs. Embarrassing Information) - Algorithms have a hard time
discerning PII & PEI - Data out of context is a
privacy violation - Privacy is not access control
- People care about privacy
- But they all also care about
publicity – a right to be in public - Facebook users have an impression
that “Facebook is more private than MySpace” - Newsfeed – publicizing
implicit (but accessible) content in explicit way - Initially controversial,
became a great success - Created a set of norms in
the “Facebook world” - Beacon – people are vessels
for advertisements - Was a failure, ended in
a user lawsuit - New default privacy settings
- Research shows that people
do not understand their privacy settings in Facebook - In fact, their mental map
of settings doesn’t match the actual settings - Slow changes from private
to public - Users are like frogs who
are slowly “cooked” and do not realize it - Data from 3rd
party sites is slowly aggregates - Tastes, web actions
are made public - Opt-out is the norm at Facebook
- People do not understand
what they implicitly agree to - Regulations
- Involvement from governments
(esp. from Europe,Canada) - Researchers --- need to
understand the consequences of their analysis
Thursday, April 29
Danah Boyd WWW keynote: Privacy and Publicity in Big Data
Today Danah Boyd's gave an address on the Privacy and Publicity in the context of big data at WWW 2010 . Danah released a crib sheet summary on her website, which you should read. Here are Michael's notes from the talk.
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment