Monday, January 7

Semantic tagging of Wikipedia and a workshop at ECIR

The European Conference on Information Retrieval, ECIR 2008, is coming up, from March 30th to April 3rd at Glasgow University. I would love to attend, but it doesn't appear likely.

One workshop that I would like to attend is Exploiting Semantic Annotations in Information Retrieval, organized by Omar Alonso from A9 and Hugo Zaragoza from Yahoo! Barcelona. From the description:

By semantic annotations we refer to linguistic annotations (such as named entities, semantic classes, etc.) as well as user annotations such as microformats, RDF, tags, etc. We are not interested in the annotations themselves, but on their application to information retrieval tasks such as ad-hoc retrieval, classification, browsing, textual mining, summarization, question answering, etc...

In particular, techniques have been developed to ground named entities in terms of geo-codes, ISO time codes, Gene Ontology ids, etc. Furthermore, the number of collections which explicitly identify entities is growing fast with Web 2.0 and Semantic Web initiatives...

Despite the growing number and complexity of annotations, and despite the potential impact that these may have in information retrieval tasks, annotations have not yet made a significant impact in Information Retrieval research or applications. Further research is needed before we can unleash the potential of annotations.

There have been some recent efforts on automatically semantically Wikipedia. For example, Hugo and other Yahoo researchers made available a Semantically Annotated Snapshot of the English Wikipedia (SW .1).

Also, in the paper Autonomously Semantifying Wikipedia, Fei Wu and Danield Weld from the University of Washington describe the KYLIN system that automatically extracts semantic information from Wikipedia, with two main goals:
  1. Automatically generating "infoboxes", the concise tabulated summaries of the subjects attributes
  2. Autonomously linking articles to create useful structure between articles
And of course, as I have mentioned in the past, there is FreeBase, a structured version of Wikipedia.


  1. I was pleasantly surprised that you told me about this topic, because rarely anyone talks about it.

  2. I am very pleased to read that efforts are being made to improve the resource that every person enjoys in practice.