Monday, December 5

RawSugar the first tag based web search engine

A colleague asked the question: What's lies beyond link text when it comes to search engine relevance? One possible solution is tagging. John Battelle recently posted on his blog: Will Tagging Work? I have started to think about this in a web search context and I'm not sure I have any answers, but here is at least an introduction...

There are some that think the "next big thing" is tagging. This is all part of the "Web 2.0" way of doing things where users generate content. The most famous examples of these models are Wikipedia, Flickr, and

The question is, can this be extended to the web as a whole? Search engines crave high quality meta-data about web pages. First, they use sophisticated computer algorithms, like clustering, to derive meta-data. However, sometimes humans can provide more insightful data. Users can generate this data explicitly by tagging urls directly, or implicitly through some by product of using the service, even by playing a game. One of the coolest examples I have seen of this type of system is the ESP Game. The ESP Game is an attempt by CMU researchers to get users to label image data. In fact it is entitled "The ESP Game: Labeling the web". Very compelling incentive -- addictive fun.

One group trying to build a high quality social network-tag-based search engine is RawSugar. There is an interesting interview with its founder over on Free Internet Radio (Thanks to's Weblog). At first glance RawSugar may appear to be another Delicious rip-off. However, it is more than a social bookmarking platform -- it is the first real tag based social search engine. A Raw Sugar employee provides a good description of this differentiation over on Tech Crunch:

Most importantly, our search is not the same as and most (though not all) of the other sites in the tagging space–we search the tags, notes and full text of pages saved into our system while, at least for now, only searches tags and, i think, notes.
RawSugar is angel funded with about ten engineers working on the engine. They have just made some very interesting service upgrades, check out their blog for details. According to a recent interview with CEO Ben-Shachar they are using an interesting mix of technologies, including PostGresSQL and Lucene. Lucene is an Apache project -- a very popular open source indexing library, in Java and other languages.

Right now I would say Raw Sugar is more of an experiment than anything else -- it only has about 135,000 pages indexed (based on stop word tests my estimate is about 170k) and an undisclosed number of users. If it can scale and attract a sizable user base it could be something to watch. At the very least, it is an experiment to learn from.

Rollyo is another search engine using a more implicit approach to tagging. It allows people to create their "own custom verticals" by performing restricted searches across a collection of sites organized into a "Roll". One of the by products of creating a roll is the creation of a human created cluster of sites organized under an informative title and keywords. One of the biggest questions I have about Rollyo is: Can it scale? Users are currently limited to 20 sites in a roll and you can only search one roll at a time. Is being able to restrict a keyword search to a list of websites enough incentive to use the service? I'm not convinced -- I think there is a lot of potential, but will it catch on? What compelling new features does it offer to get people to switch?

The question that these and others are trying to answer is: How can search engines get users to tag web pages with usable content as a by product of their daily surfing? What incentives motivate users to provide reliable and useful tags? And lastly, how can search engines handle spam in these tagging systems?

To sum it all up, I'm not sure if tagging will work. Right now I have more questions than answers -- and the questions are still fairly nebulous. I hope to refine these questions when I attend the WWW 2006 conference and hopefully attend the Collaborative Web Tagging workshop on May 21st. Raw Sugar, Yahoo, and other major players will be taking part, so I have high hopes for an interesting discussion. More on the confirmed speakers at the Raw Sugar Blog...

More reading:
Social Consequences of social tagging
There is also a paper available via the ACM on the ESP game:
"Labeling images with a computer game"


  1. Anonymous1:13 AM EST

    hardly. other people started here before this -- wink, simpy, delicious, etc.

  2. My post differentiates between "web search engines" and other types of tagging applications. I would argue that Simpy and Delicious are bookmarking applications with search functionality -- not tagging based search engines. On a side note, Simpy is also powered by Lucene, written by Otis Gospodnetic.

    Wink is more interesting because the line is a little blurier. It is still in semi-closed beta; it is not public yet. At this point, I see it more as a tagging platform with web search powered by Google tacked on rather than a tagging search engine. There is definitely a lot of potential if they can find a way to blend web + tag search together.

    Another interesting one is Shadows, but again, it is another social platform that allows bookmark sharing, community blogging, etc... I think Shadows and Wink deserve more attention I have given, and I will hopefully write more on them soon. However, I don't think I would call either a web search engine.

    I still think RawSugar is the closest thing we have seen to a search engine. However, perhaps it doesn't qualify as a web search engine because they don't have a corpus collected via web crawling. Still, their UI and full corpus search is the closest to a search engine that I've seen which is currently available to the public.