Saturday, December 10

Yahoo Having a Wiki Good Time

With all the major buzz over Yahoo's acquisition of social sites, I thought it would be interesting to look at one that Yahoo hasn't acquired yet -- WikiMedia. I guess it's not surprising, but what I found was that the two are already collaborating.

Back in April Yahoo and Wikimedia announced that Yahoo would be donating server and bandwidth resources in Asia. You can read the full press release on Wikimedia's website. According to the press release,"Yahoo! is one of the Wikimedia Foundation's earliest corporate supporters."

At this time Jimmy Wales, the creator of Wikipedia, was guest blogger on Yahoo's search blog. Jimmy Wales clearly says that Yahoo's relationship is purely "charitable." Jimmy says in his post:
As our relationship with Yahoo has grown over the past year, we began to talk about other ways that Yahoo could help us. One theme that made sense for both of us was to think about Yahoo's global reach and Wikipedia's global goals.
In what ways are Yahoo and Wikimedia going to collaborate on global expansion? What are other ways that Yahoo will help Wikimedia? I wonder how this relationship will evolve.

Yahoo wants Wikipedia to succeed because it has a lot at stake in this new social movement -- Yahoo has a large amount of capital, tens of millions, vested in seeing it succeed. Wikipedia's continued properity and growth lends credence to Yahoo's new community initatives.

More thoughts on this to come...

Friday, December 9

Yahoo acquires -- The $30 Mln Firefox extension

The news broke this afternoon on Tech Crunch, Yahoo's Search Blog, the Delicious Blog, Jeremy's blog, Om, -- and then all over the place that Yahoo has purchased Delicious.

According to Battelle this didn't come cheap. He says that according to his sources the deal is pegged around $30-35 million dollars. There are other rumors that the figure was closer to $40 million. Update: Om, and others estimate this to probably be lower. Personally, If I had to wager a guess it would be something on the order of $15-20 million dollars. My estimate is in line with umair, which is based his estimate of a street value in the range of $10 million, and the supposition that Delicious could command a premium well over and above that.

Yahoo was acquiring the one thing My Web 2.0 lacked -- a good Firefox extension. The Firefox extension converted at least one user, Nick Wilson, who defected back to Delicious because of this dirth in My Web. Is Delicious' user community and tag meta-data worth $10-40 million? How many Firefox extensions did they buy for this chunk of change? Now, in all seriousness, if not for the firefox extension, Why Did Yahoo Really Buy Delicious?

The rationale for this acquisition is obvious, the community, of course! It's not about acquiring cool technology. It's about increasing the value of My Web 2.0 through the Network Effect. This states that the percieved value of the service is dependent on the number of users already using the service. See more about the network effect and MetCalfe's law on Wikipedia.

One thing to note is that Delicious' traffic curve is going in the right direction. According to Alexa, it's reach has tripled since October!

Update: BusinessWeek and others estimate the size of Delicious' user community to be 300,000 users. Depending on who you believe, this means that Yahoo paid between $30-100 for ever user's bookmarks. Maybe I should try auctioning my bookmarks on EBay and pay for some Christmas gifts. It remains, three hundred thousand users is a drop in the proverbial bucket to someone as big as Yahoo. However, I would wager that Yahoo's My Web 2.0 doesn't have the same exponential traffic growth curve.

Let me try and provide an inside look at a hypothetical a meeting at Yahoo HQ. Sitting around the table are Terry Semel, Jeremy Zawodny, Jeff Weiner, Jan Pederson, and other Yahoo execs. Terry asks, "So, how long for My Web 2.0 to reach critical mass?" The project manager gives a presentation that provides several different projections based on varying marketing spends, etc.., etc... but the bottom line is that it is still a ways out there -- he estimates 1-2 years and millions of dollars. The next big question Terry asks is, "How can you make this go faster?" The PM replies, "Well, if we buy a community like Delicious, it will cut the time in half ." They talk it over and then someone asks, "What would it cost us if that other company purchases Delicious and uses it to start their own community initative?" Ouch. They look at each other for a minute and Terry break's the silence, "Buy it."

The Yahoo leadership recognize that Delicious in and of itself may not be worth $30-40 million, but being the first to reach critical mass with the "community thing" is critical. As it stands, Yahoo can use Delicious' amazing growth rate and existing user base to accelerate the growth of My Web 2.0 and take an early lead in the market. If I were them, I would hope that the time to grow a community would provide a barrier to entry for competitors.

Overall, the acquisition isn't surprising, but I honestly say that it still caught me a bit off gaurd. I expected Yahoo to try and do it themselves, after all, they had the technology and the marketing money. As I said earlier, I think it really comes down to time and expertise. Joshua and the Delicious users can help Yahoo accomplish their goal faster.

Let's take a moment to compare this with the Flickr purchase. More than the traffic differences the biggest difference I see is in the audience of the two sites. The Delicious crowd is really super geeky -- look at the "Popular" pages and you can see what I mean. Flickr appeals to a wider audience that is more inline with Yahoo's audience. Perhaps Yahoo can bring Delicious to a wider crowd, but we'll have to see what happens to it. Nick Wilson has a really catchy quote in his post on the acquisition that I wish I had thought of:
So What Happens as Delicious Leaves the Geekerati and Joins the Mainstream?
This acquisition will surely alienate some cyber geeks in the tech community. If you check out the comments on Digg, many are not happy and accuse Delicious of "selling out."

Ho John Lee posted some very interesting ideas in his repsonse to the news. Here is what he had to say that I found the most interesting:
Tagged bookmarking sites such as can provide a rich source of input data for developing contextual and topical search. The early adopters that have used up to this point are unlikely to bookmark spam or very uninteresting pages, and the aggregate set of bookmarks and tags is likely to expose clustering of links and related tags which can be used to refine search results by improving estimates of user intent. Individuals are becoming their own search engine in a very personal, narrow way, which could be coupled to general purpose search engines such as Yahoo or Google.
Think of millions of users bookmarking sites. The early adopters might not bookmark spam, but will a wider audience? What about all of the SEOs who realize that creating accounts and bookmarking pages gets them more traffic in the context of a larger Yahoo audience? Finding interesting relationships in the user data is a veritable mountain of gold. The question is will this gold tarnish as it grows?

All of these acquisitions present Yahoo with some really cool properties, but also some interesting problems. How is Yahoo going to integrate Delicious and Flickr with My Web 2.0? How do you keep the fans happy while integrating all of these pieces into the bigger platform?

In a broader context, this acquisition will likely have ripple effects throughout the Web 2.0 community. Will it be a boon or a bane? If Yahoo is smart, they will provide ways for new services to leverage the Web 2.0 / Delicious platform to layer services on top of it. Delicious' lavish reward should also spur the number of "Web 2.0" startups that try to jump on the bandwagon. If some of them become successful will continue to interoperate with them, squash them, or buy them too?

I think this acquisition is a good thing for both Yahoo and Delicious. It's a win-win. Yahoo gets a lot of users and tag content to bootstrap their platform. Delicious gets cash, but it also gets the resources to take their business to a whole new level inside the Yahoo network.

My congratulations to Joshua, Fred, and the rest of the Delicious team.

Update: Greg also raises a lot of the same points as I do, albeit a bit more eloquently and succinctly. He also raises some even more interesting questions: Did they sell too soon? If this whole thing works out Delicious was in a great position to do better later on. A lot of search companies back in the dot com boom were bought up by large media companies only to be neglected and later abandoned. If one of those stuck it out they might have been the next Google. Another interesting point he makes is that this acquisition might lead to the perception that Web 2.0 companies are "in it to flip it." What are the long term business models?

Monday, December 5

RawSugar the first tag based web search engine

A colleague asked the question: What's lies beyond link text when it comes to search engine relevance? One possible solution is tagging. John Battelle recently posted on his blog: Will Tagging Work? I have started to think about this in a web search context and I'm not sure I have any answers, but here is at least an introduction...

There are some that think the "next big thing" is tagging. This is all part of the "Web 2.0" way of doing things where users generate content. The most famous examples of these models are Wikipedia, Flickr, and

The question is, can this be extended to the web as a whole? Search engines crave high quality meta-data about web pages. First, they use sophisticated computer algorithms, like clustering, to derive meta-data. However, sometimes humans can provide more insightful data. Users can generate this data explicitly by tagging urls directly, or implicitly through some by product of using the service, even by playing a game. One of the coolest examples I have seen of this type of system is the ESP Game. The ESP Game is an attempt by CMU researchers to get users to label image data. In fact it is entitled "The ESP Game: Labeling the web". Very compelling incentive -- addictive fun.

One group trying to build a high quality social network-tag-based search engine is RawSugar. There is an interesting interview with its founder over on Free Internet Radio (Thanks to's Weblog). At first glance RawSugar may appear to be another Delicious rip-off. However, it is more than a social bookmarking platform -- it is the first real tag based social search engine. A Raw Sugar employee provides a good description of this differentiation over on Tech Crunch:

Most importantly, our search is not the same as and most (though not all) of the other sites in the tagging space–we search the tags, notes and full text of pages saved into our system while, at least for now, only searches tags and, i think, notes.
RawSugar is angel funded with about ten engineers working on the engine. They have just made some very interesting service upgrades, check out their blog for details. According to a recent interview with CEO Ben-Shachar they are using an interesting mix of technologies, including PostGresSQL and Lucene. Lucene is an Apache project -- a very popular open source indexing library, in Java and other languages.

Right now I would say Raw Sugar is more of an experiment than anything else -- it only has about 135,000 pages indexed (based on stop word tests my estimate is about 170k) and an undisclosed number of users. If it can scale and attract a sizable user base it could be something to watch. At the very least, it is an experiment to learn from.

Rollyo is another search engine using a more implicit approach to tagging. It allows people to create their "own custom verticals" by performing restricted searches across a collection of sites organized into a "Roll". One of the by products of creating a roll is the creation of a human created cluster of sites organized under an informative title and keywords. One of the biggest questions I have about Rollyo is: Can it scale? Users are currently limited to 20 sites in a roll and you can only search one roll at a time. Is being able to restrict a keyword search to a list of websites enough incentive to use the service? I'm not convinced -- I think there is a lot of potential, but will it catch on? What compelling new features does it offer to get people to switch?

The question that these and others are trying to answer is: How can search engines get users to tag web pages with usable content as a by product of their daily surfing? What incentives motivate users to provide reliable and useful tags? And lastly, how can search engines handle spam in these tagging systems?

To sum it all up, I'm not sure if tagging will work. Right now I have more questions than answers -- and the questions are still fairly nebulous. I hope to refine these questions when I attend the WWW 2006 conference and hopefully attend the Collaborative Web Tagging workshop on May 21st. Raw Sugar, Yahoo, and other major players will be taking part, so I have high hopes for an interesting discussion. More on the confirmed speakers at the Raw Sugar Blog...

More reading:
Social Consequences of social tagging
There is also a paper available via the ACM on the ESP game:
"Labeling images with a computer game" a search engine for the holidays

Search Engine Watch recently ran an article on what's new for the 2005 Shopping season. One important site that I think the author missed is Incidentally, SEWatch does not provide a means to comment on this story, which is somewhat frustrating. is a vertical search engine specialized in shopping. The growth of a shopping vertical isn't that surprising consider the explosion of e-commerce on the web, but there is lots of competition, especially from new services like Froogle which, on a side note, just started offering Geo-targeted results, and Yahoo Shopping. has some experience from its start-up veteran founders: Michael Yang and Yeogirl Yun. They founded the comparison shopping site MySimon and sold it at the height of the dot come boom to CNet for some serious cash -- smart people. If two people are going to take on shopping, at least they have some experience with comparison shopping -- but this time they decided to take their idea to the next level and combine research with comparison shopping. has two modes -- shopping and research. As soon as I saw this it immediately said to me: Yahoo! Mindset. However, instead of having this slider that dynamically filters results, they have created something much simpler. KISS -- not to mention that us cs geeks like binary choices. Interestingly enough, Globalspec also has the same two modes, although not as explicitly defined: research (The Engineering Web) and product search (SpecSearch).

The research side of Become is a 3+ billion page web index, purpotedly emphasizing review and informational sites. The shopping mode is very much like MySimon, finding products from store feeds. I think the best new "cool" feature they have is the ability to start doing other types of Faceted Search, allowing filtering by features beyond price, on things like brand name. It would be cool if they could extend this to something more powerful -- like Specsearch for consumers -- which would allow me to perform very precise product searches based on precise specifications. However, I may not be the typical user and perhaps a text search is good enough for most people.

I am going to be keeping my eye on Become. They have been hiring people with search engine expertise to enter the market. One thing the founders of have is chutzpa -- it takes guts to go head to head against Google's main product offerings. However, seems like it is heading in the direction by assembling an experienced leadership team culled from veterans of Ebay, AltaVista, Overture, Yahoo, and Sun. For example, they recently hired Jon Glick as "Senior Director of Product Search and Comparison Shopping" (interesting job title, what does he really do?). From their press release:
Glick joins from Yahoo! (NASDAQ: YHOO) where he headed Product Management for Yahoo!'s web crawling, indexing, search relevancy, and assisted search initiatives. He was an instrumental part of the team that launched Yahoo!'s in-house web search in 2004, displacing Google (NASDAQ: GOOG).
While I was working on some side programming projects (more on these soon) I made the jump to Java 1.5. While perusing Sun's site, I ran across a very interesting article about and their creation of a large scale web crawler using Java 1.5. I had never heard of before, but the article was very informative and impressed me. From the article:
The company has successfully created a Java technology web crawler that may be the most sophisticated, massively scaled Java technology application in existence, obtaining information on over 3 billion web pages and writing well over 8 terabytes of data (and growing) on 30 fully distributed servers in seven days.
I honestly don't believe their crawler is the most sophisticated massively scaled java technlogy app in existence, but I won't start a rant on it. I would highly recommend the Sun article to everyone interested in Java 1.5 or web crawlers. Interestingly enough, I believe the Internet Archive is also using Java 1.5 for their Heritrix crawler on an AMD Opteron platform... but that's also another story. The real story is that this article prompted me to check out and I thought it was especially relevant to share because of the holiday season:

It looks like they are serious, and they have the cash and the courage to do it. For now, I am going to go give it another go as I do my Christmas shopping.

Further reading:
Silicon Beat's coverage of the new SE from Feb 05.
SE Watch's Coverage of the April launch
Geeking with Greg on Yahoo! MindSet