Wednesday, July 16

Amit on Google's Ranking Technology Part II: Give me what I want, now!

Amit has part two of his series on Google's ranking.

Search in the last decade has moved from give me what I said to give me what I want...
Lots of interesting technology that we take for granted every day - that you probably don't notice most of the time (except when it fails).

It's an interesting read. Don't miss his previous post on Google's philosophy behind ranking.

Globalspec ECIR 2008 Industry Day Presentation online

I finally got around to posting the slides my presentation at ECIR 2008.

The Challenge of Engineering Vertical Search
Topic-specific search engines leverage structure from deep domain knowledge to provide better ranking with more powerful search capability than a general search engine. However, our experience at Globalspec is that realizing this vision is quite difficult. In this talk I will use Globalspec's search systems as a model and outline some of the challenges that make topic-specific search hard. I will also talk briefly about our experiences using open source search technology. Finally, I will explore challenging problems for future research and opportunities for academic-industry collaboration in vertical search.

Better late than never...

Tuesday, July 15

Cassandra: Facebook's Open Source Structured Data Store

Via James Hamilton, on the Windows Live Platform Team, read his writeup.

Facebook has been open sourcing some good software recently, with Thrift. Last week they announced they were open sourcing Cassandra, see the Google code project.

Below is the SlideShare presentation from SIGMOD.

From what I read, and I need to investigate further, it supposedly has a BigTable-like data model running on a Dynamo-like infrastructure. It is written in Java.

Hadoop Summit and Data-Intensive Computing Symposium Videos and Slides

The slides and videos from the Hadoop summit and Data-Intensive Computing Symposium at Yahoo! held at the end of March are online. I missed it completely because I was headed off to ECIR at the time. Better late than never...

There's some really fantastic looking content, check it out if you missed it.

I also ran across James Hamilton's notes from the conference.

More on the videos later... Now if only SIGIR would post their videos online.

AutoDesk Seek: Engineering CAD Search

Going back over the Hadoop summit, I ran across a presentation by Mike Haley, Search Architect at AutoDesk. The presentation Online Search for Engineering Design Content [video, slides] looks highly relevant to search for engineering and technical professionals

Mike's team developed Autodesk Seek, from his LinkedIn profile:
We developed a completely open system that includes:
- A custom parametric search engine (based on Lucene)
- Large scale distributed processing of design content (using Hadoop and Amazon Web Services)
- Machine Learning
- A vertical ontology system for assisting in specification and finding of design data
- Driving standards for design content and metadata

From their description :
Autodesk® Seek is a robust new web service that gives architects and engineers the ability to search, select, and specify generic or manufacturer-specific building products and associated design content, including 3D models, 2D drawings, specifications, and descriptions, from within their Autodesk software package or a standard Internet browser.
It launched back in May, but once again I missed it somehow. You can also read the production description on AutoDesk's site.

I can't speak for the quality of the results since I'm not a design engineer. One question I have is content: is there a significant amount of quality content available? In other words, does Autodesk Seek have the parts and components that you want to find? I don't know the answer to that, but I know that CAD search is a hard problem.

I touched on CAD search as one of the areas of research that are interesting back at my ECIR Industry Day talk, at the beginning of April. If you're interested, my slides are online.