Wednesday, February 24

Wired Article on Google's Algorithm: Thoughts on Synonyms

I haven't been writing much recently. I was a bit burnt out after paper season. I submitted a short paper on synonym recognition to ACL 2010. I hope to share more on that in the future. On the topic of synonyms, the recent Wired article on How Google's Algorithm Rules the Web mentions briefly their synonym recognition algorithm.

Towards the middle of the article, Amit Singhal talks about synonyms. The first part talks about the straightforward mappings identified from query reformulations. I think the more interesting case is when you don't have millions of those to learn from. You can use the information on the web documents. Here's the relevant section,
Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theories about how words are defined by context... “Today, if you type ‘Gandhi bio,’ we know that bio means biography,” Singhal says. “And if you type ‘bio warfare,’ it means biological.”
This type of query sensitive synonym usage is quite important for web retrieval.

See also my recent previous post on Google's synonym effectiveness and their recent patent on using query context for determining synonyms.

No comments:

Post a Comment