Thursday, October 27
CIKM 2011 Industry: Model-Driven Research in Social Computing
Model-Driven Research in Social Computing
Ed Chi
Google Social Stats
250k words per minute on blogger, 360 million words per day
100M+ people take a social action on YouTube
Google+ Stats
40 million joined since launch
2x-3x more likely to share content with one of their circles than to make a public post
Hard to talk about because the systems are changing quite rapidly
Ed joined Google to work on Google+
Social Stream Research
Analytics
- Factors impacting retweetability (IEEE Social computing)
- Location field of user profiles
Motivation for studying languages
- Twitter is an international phenomenon
- How do users of different languages use Twitter?
- How do bilingual users spread information across languages?
Data Collection & Processing
- 62 M tweets (4 week), spritzer feed in april-may june 2010
- Language detection with Google language API + LingPipe
- 104 languages
- Top 10 languages
English - 51%
Japanese - 19 %
Portuguese - 9.6% (mostly Brazil)
Indonesian - 5.6%
Spanish - 4.7%
Sampled 2000 random tweets
- 2 human judges for each of the top 10 languages
Problems with French, German, and Malay.
Accuracy of Language Detection
- Two Types of errors (poor recognition for "tweet English") and for tweets with 1-2 words
Korean - recommend for conversation tweets
German - promote tweets with URLs
English serves as a hub language
Implications - need to understand when building a global network on language barriers
- building a global community
- the need for brokers of information between languages
Visible Social Signals from Shared Items (Chen, et al. CHI 2010/CHI 2011)
- After all day without WIFI, he would like a summary of what's happening in his social stream
- Eddi - Summarizing Social Streams
--> What's happened since you last logged in
--> A tag cloud of entities that were mentioned
- A topic dashboard where tweets are organized into categorizes to drill into
Information Gathering/Seeking
- The Filtering problem - I get 1,000+ things in my stream, but only have time for 10. Which ones should I read?
- The Discovery Problem
-- millions of URLs are posted,
Zerozero88.com
- twitter as the platform
- URLs as the medium
- a personal newspaper that produces personal headlines
URL Sources (from tweets) -> Topic Relevance Model, and Social Network Model
URL Sources
- Consider all URLs was impossible
-- FoF URLS from followee-of-followers
--> Social local news is better
- Popular - URLs that are popular across whole of Twitter
--> popular news is better
Topic Relevance Model
- A user Tweets about things, which creates a term vector profile.
- Cosine similarity with URLs
- Topic Profile of URLs - Built from tweets that contain the URL, but tweets are short and RT makes word frequencies goofy.
- Adopt a term expansion technique, extract nouns from tweet and feed it into a Wikipedia search engine as a topic detection technique
Topic Profile of User
- Self-topic
- Information producer - the things they tweet about
- Information gatherer - what they like to read
- Build profiles from froms and aggregate them.
Social Module
- Take FoF neighborhood, and count the votes for a URL
- Simple counting doesn't work very well.
- Votes are weighted using social network structure
Study Design
- Each subject evaluating 5 URL recommendations from each of the 12 algorithms. Show 60 URLs in a random order and ask for binary rating,
Summary of Results
- Global popularity (1%) -- 32.50% are relevant, not bad, but not good enough
- FoF only - 33% - naiive by itself without voting doesn't work great
- Fof voting method - 65% (social voting only)
- Popularity voting - 67%
- FoF Self-Vote - 72% best performing
Algorithms differ not only in accuracy!
- Relevance vs. Serendipity in recommendations (tension between discovery and affirming aspect)
-> "What I crave is surprising, interesting, whimsy" this is where the value is
-> Two elements two surprise: 1) have I seen this before, 2) non-obvious relationships between things
Design Rule
- Interaction costs determine number of people who participate
- Reduce the interaction costs, then you can get a lot more people into the system
- For Google+ this is key to deliver this to people
Q&A:
Japanese crams more information into a tweet. It is used more for conversation than broadcast in these environments
Subscribe to:
Post Comments (Atom)
nice notes. LMK if you have questions.
ReplyDelete