Monday, July 7

A variety of diverse opinions: IBM Many Aspects Text Summarization Tool

IBM recently posted a new text summarization tool, IBM Many Aspects on their alphaWorks site. Many Aspects is a java tool that is available for download. It's goal seems to be to summarize documents with multiple topics or viewpoints.

From the description:
These sentences are picked using the following two criteria:
  • Coverage: The sentences should span a large portion of the spectrum of the document's subject matter.
  • Orthogonality: Each sentence should capture different aspects of the document's content. That is, the sentences in the summary should be as orthogonal to each other as possible.

...For example, in online comments and discussions following blogs, videos, and news articles, it is desirable to have a summary that highlights different angles of these comments because each often has a different focus. With IBM Many Aspects Document Summarization Tool, you can get a concise yet comprehensive overview of the document without having to spend lots of time drilling down into the details.

You can read the research behind the tool as well, ManyAspects: A System for Highlighting Diverse Concepts in Documents by Kun Liu, Evimaria Terzi, and Tyrone Grandison from VLDB 2008.

From the paper, the primary use case seems to be summarizing user opinions about a movie, or a product. In these cases, it's useful to identify the different aspects of the product/movie being discussed.

If you get a chance to try it out, I'd love to hear what you think. Is it useful?

