A few key points to remember. First we have to keep in the forefront:
It's not about the technology, it's how it enriches our lives and makes it better.
The Data
160 Million accounts. 90 million tweets per day. 16.7 gb of tweets. > 1000 tps
200,000 time line rps, 3GBs outbound data, 1 B queries per day
Tweets are searchable within seconds and the data is kept forever.
About 30% of search traffic is generated by clicks from trending topics.
In 1ms answer the following about a tweet:
- what language is this tweet?
- where was this tweet posted from?
- what are the entities in this tweet?
Every X min answer the following:
- Which tweets should you ignore?
- What topics are trending and where?
A key problem is how to evaluate the quality of trending topics. What makes one topic 'better' than another?
One of the coolest things I saw from the talk was the vizualization of the World Cup tweets, which was on their blog, World Cup 2010: A Global Conversation. It was created by Miguel Rios, whose work you can check out on his website.
Abdur ended with an admonition to researchers to think about the impact of their work,
Why does your research matter? Will it make the world a better place?
No comments:
Post a Comment