Wednesday, May 9

Octopart and SupplyFrame: Part Search Engines

Octopart is a new part search engine funded by Y Combinator.

It was started by two UC Berkeley physics grad student drop-outs, Sam and Andres. Octopart aggregates part data from major distributors: Newark InOne, Digi-Key, Allied Electronics, and Mouser (more to be added). It allows wildcard (very useful for part numbers), phrase, and boolean searches. The search results include pricing and availability comparison across the distributors, product images, product specs, and part data sheets. The UI is very reminiscent of Google. They get daily part feeds from Newark to keep their availability fresh. The engine is written in Python.

TechCrunch has a write-up on them.

Overall, a very nice start. We'll see how they evolve and add features.

Another competitor in this field is SupplyFrame. Supply frame is more tightly integrated into the buying process with tools (including desktop integration) for RFQ handling, Bill of Materials, etc... more geared for large scale business buyers.

From a recent press release:
SupplyFrame takes component searching one step further by giving users the ability to easily create and manage lists of parts. With SupplyFrame interactive quoting tools, buyers and engineers can run parts lists or complete Bills of Materials through a full quoting cycle with any suppliers in the world.
Another feature it has is pricing and lead-time trends for parts (the data still looks pretty sparse). See the trends in a search for: SN74HC14N.

There are other part search engines (ChipIndex, FindChips, etc...), but these are the two new contenders.

Disclaimer: my employer, Globalspec, is a part and components search engine.

Search Innovations Article on R/W Web

Read/Write Web has an article on the Top 17 Search Innovations Outside of Google. The article is broken up into 17 areas of innovation.

Globalspec is mentioned under number 7, parametric search:
GlobalSpec lets you specify a variety of parameters when searching for Engineering components (e.g. check out the parameters when searching for an industrial pipe). Parametric search is a natural feature for Vertical Search engines...Google has already incorporated this feature at a general level - such as the parameters on the Advanced Search page - but that waters down its usefulness. The most powerful use of this feature happens when additional parameters become available as you drill down further into standard search results or when you constrain the search to specific verticals.
Good press is always nice :-).

Parametric search (database-like) and Google-like text search are converging. Google is adding faceted search via Google Base, and the traditional sql-like parametric engines are adding better full-text search. The goal is seamless integration between the two (a primary goal of faceted search). Faceted search engines like Ebay Express and are leading the way in this area.

The full list of the 17 areas:
  1. Natural language processing
  2. Personalization
  3. Canned, specialized searches (I'm not too sure about this one)
  4. New content types (Video, Images, Audio, etc...)
  5. Restricted data sources (aka 'custom' search engines like Google Co-op and Rollyo)
  6. Domain-specific search
  7. Parametric search
  8. Social search (Delicious, Stumbleupon, etc...)
  9. Human input (aka answers, Yahoo Answers)
  10. Semantic search (I'm not sure I agree, Spock and ZoomInfo, information extraction)
  11. Discovery support (aka recommendation systems)
  12. Classification, Tag clouds, and clustering (Is this innovation? Google uses clustering behind the scenes... I don't think users need or want this.)
  13. Results Vizualization (new ways to display search results
  14. Results refinement and filters (I'm not sure this is big either, it borders on faceted search the way it is described).
  15. Results Platforms (search result APIs? This is a bit strange. The closest example is the Alexa Web Platform).
  16. Related Services (I'm not sure I understand this..., not well defined)
  17. Search Agents
Overall, not a bad list, especially the top of the list. It seemed like they were stretching a bit towards the end...

Monday, May 7

SIGIR 2007 workshops and Learning to Rank for IR

The SIGIR 2007 (in Amsterdam) workshops were announced last Thursday.

Of particular interest is the Learning to Rank for Information Retrieval workshop. Papers are due by June 8th. From the description:

The task of "learning to rank" has emerged as an active and growing area of research both in information retrieval and machine learning. The goal is to design and apply methods to automatically learn a function from training data, such that the function can sort objects (e.g., documents) according to their degrees of relevance, preference, or importance as defined in a specific application.

The relevance of this task for IR is without question, because many IR problems are by nature ranking problems. Improved algorithms for learning ranking functions promise improved retrieval quality and less of a need for manual parameter adaptation. In this way, many IR technologies can be potentially enhanced by using learning to rank techniques.

A major theme at the workshop will be of course, LETOR, MSR Asia's collection of datasets to compare these type of machine learning based ranking systems. See my previous post on LETOR.

The LETOR website now has some critical bug fixes posted on the first version and a formal release is planned for the end of the month (according to the website).

Sunday, May 6

Powerset and Natural Language Search at UW

Barney Pell, from PowerSet spoke at UW last week on Powerset and Natural Language Search. From the abstract:

In this talk, we discuss the concept of natural language search. Central to this is a new user experience, in which users express queries in natural language and the system responses respect the linguistic information in the query.

To realize this vision at broad scope and scale will require advances in a variety of technology areas, including natural language processing, information extraction, knowledge representation, and large-scale search indexing and retrieval systems.

In addition, it will require innovations in user interface. Issues include changing user behavior, educating users about the affordances and constraints of the technology, supporting users in formulating effective queries, and managing expectations.

Hopefully, the video will be online soon at UWTV.

Also in the same lecture series, Raghu Ramakrishnan from Yahoo Research gave an interesting lecture in February: Community Systems: The World Online.

Is Relevance Relevant?

Elizabeth van Couvering, a PhD student at the London School of Economics recently published a paper: Is Relevance Relevant? Market, Science, and War: Discourses of Search Engine Quality. From her abstract:
The evidence presented here suggests that resources in search engine development are overwhelmingly allocated on the basis of market factors or scientific/technological concerns. Fairness and representativeness, core elements of the journalists' definition of quality media content, are not key determiners of search engine quality in the minds of search engine producers. Rather, alternative standards of quality, such as customer satisfaction and relevance, mean that tactics to silence or promote certain websites or site owners (such as blacklisting, whitelisting, and index "cleaning") are seen as unproblematic.

There is good discussion of her article on John Battelle's post on the topic, including follow-up with Matt Cutts from Google and from Elizabeth herself.

Oshoma Momoh, formerly a GM of MSN Search, also commented on the article on his blog.

Ms. Couvering also previously published, Web Behavior: Search Engines in Context.