Wednesday, December 1

Google Fixes Problem

This afternoon Amit Singhal, from Google Search Quality, wrote a blog post about how Google fixed the recent fiasco. The store was broken by the NY Times article exposing how a disreputable merchant gained high ranking by being mean to his customers. He gained links and reputation by being written up negatively on many popular and important sites.

What's interesting is Amit's post is the insight into how Google's team approached solving the problem.

1) Only blocking the site would not solve the underlying problem.
2) Sentiment analysis wouldn't solve the problem because reputation was coming from neutral news sites with solid reputations. Google has not yet found a useful way to incorporate sentiment into ranking.
3) Expose the reviews and ratings next to the results, but this would not actually alter the ranking.

Their new secret undisclosed fixed detected the problematic merchant and several hundred other bad apples.

Tuesday, November 30

Lectures on User Behavior Modeling and Implicit Feedback from Query Logs

I am one of the TAs for the graduate IR course at UMass this semester. I recently gave two lectures on modeling user behavior and utilizing implicit user feedback from logs.

User Behavior Modeling. I covered models of information seeking behavior. Then, I went over the Google 3M (micro-, meso-, and macro-) characterizations of interactions. We looked at how we learn about these various levels of interactions through field and lab studies, instrumented panels, and query logs.

Implicit User Feedback. We finished up query log analysis including query classification, applications like disambiguation and trends. Most of the time was spent on interpreting clickthrough and browsing behavior to generate preference and relevance data.

If you want to learn more, a lot of the lectures build on materials from Eugene Agichtein's tutorial on Inferring User Intent at WWW 2010. If you want more detail, their intent project is a good place to start.

Call for participation of Academic IR community in Lucene

Otis Gospodnetic, a committer on the Lucene project put out a call on the SemaText blog for greater engagement of academia with the open source Solr/Lucene community. In particular, he is seeking ideas for advanced topics that would be worth of a MS/PhD thesis that would be implemented and contributed to the community.

If you have ideas, please add it to the public idea spreadsheet he started. I strongly you to go there and contribute.

Lucene is the most widely used search engine library. If important new academic ideas that improve retrieval get incorporated, the impact would be huge.

However, historically, the Lucene community and academia has been kept very separate. Instead, the research teams have developed their own systems, the fragmentation is apparent if you look at my list of open source search libraries.

Lucene's ranking algorithms are dated and it is inflexible and difficult to change. Because it is so widely adopted, it is hard to modify and extend in radical ways. If academia is going to get involved, some of these issues need to be addressed, and a lot of it is straightforward engineering work that would enable it to be a better research platform.