Friday, February 15

PopFly and Yahoo! Pipes Mashup Creator Comparison

John Udell compares and contrasts Microsoft's Popfly with Yahoo! Pipes for mashup creation.

From his summary:
Pipes likes to aggregate, transform, and filter data feeds within the cloud, and can produce a few kinds of renderings in your browser. Popfly likes to aggregate, transform, and filter data feeds from the cloud, and can produce arbitrary renderings in your browser. They’re complementary because Popfly can consume and render data feeds coming from Pipes.
His comparison follows a piece on PopFly in Sunday's NY Times.

I think these mashup tools are important because they foreshadow the potential of "information integration." Information Integration is the study of combining and summarizing information from multiple sources, often involving text or semi-structured data. It is #2 on Hector Molina's priority list after "Beyond Search," see Erik's summary of his WSDM keynote.

Content aggregators and vertical search engines that get good at information integration will have a significant competitive advantage "mashing up" content from multiple sources to create new and useful value for users.

Monday, February 11

Microsoft core search PM quits to a create startup

Search Engine Land has a in-depth interview with Eytan Seidman, the program manager for the core crawling, indexing, and ranking team of Live Search. Don't miss the funny videos at the end!

The interview goes into detail on the history of Live search and Eytan's key contributions from its inception.
..Microsoft began work on its own search engine in January 2003. Eytan was the first Program Manager on the new search project and the team started out with just five developers. When he joined the team, they had just passed their first major milestone of crawling 10-20 thousand documents. "We didn't yet know how to serve or rank, but we had started crawling, so we were excited," he recalls.
The new search team put together and end-to-end system of crawling/indexing/ranking/and serving with a 100 million document index they called "Little Dog". Even though the index was small, it all worked and it was the first time they'd had an end-to-end search engine that was entirely their own. By early 2004, Little Dog was up and running.
The article reports that Eytan is leaving to work on a startup with his Ariel (ironically, from Yahoo search) in New York.

Eytan, welcome to the Empire State.

Web Search and Data Mining 2008

WSDM 2008 is happening this week at Stanford.

Erik Selberg has coverage of Hector Garcia Molina's keynote. It's a little vague (clearly a general outline), but Hector's priorities are interesting:

Hector's research priorities:

  1. Beyond Search
  2. Information Integration
  3. Monetizing
  4. Social Networking
  5. Coping with Scale
I'm curious to learn the details of what he said, especially on "Beyond Search." Also, according to Erik, Hector dissed personalization, "He also doesn’t like Personalization, as he doesn’t like things that change." I'm sure Greg took him up on that challenge.

Here are few of my picks to read:

  • Crawl Ordering by Search Impact (Pandy and Olston from Yahoo! Research)

  • Beyond Basic Faceted Search (IBM Haifa)

  • Fast Learning of Document Ranking Functions with the Committee Perceptron (CMU LTI)

  • SoftRank: Optimizing Non-Smooth Rank Metrics (MSR Cambridge)

  • An Experimental Comparison of Click Position-Bias Models (MSR Cambridge)

  • Preferential Behavior in Online Groups(Yahoo! Research)

  • Finding High-Quality Content in Social Media (Eugene Agichtein and Yahoo! Research)

  • Can Social Bookmarking Improve Web Search? (Molina and others from Stanford)
There's a lot of quality papers to read. However, a small bit of writing advice for authors, please don't make your abstract a thirteen sentence long paragraph that takes the entire first column of your paper. If you can't be concise in summarizing your work, it's not worth my time to read it. If you ever catch me committing this type of writing offense, please, PLEASE, stop me. End rant.

Hopefully, I will have time to write up more detailed reviews of the papers later.