Background: The beginning
I have seen several resources, such as this history of search engines, that wrong identify MetaCrawler as the first web meta-search engine. The first search engine was in fact SavvySearch, but not by a wide margin. SavvySearch and MetaCrawler were both university research projects released within months of one another, SavvySearch in March 1995 and Metacrawler in July of the same year. The bottom line is that they were both in development at the same time, in 1994 and were released to the public in '95.
SavvySearch was a research project out of the Colorado State University that attempted to provide a centralized interface to web search engines and specialized databases through intelligent serivce selection. It gave users with a "search plan" to execute their query aginst the most relevant subset of engines and databases for their query. Their goal was attempt to optimize two conflicting goals: "minimizing resource consumption and maximiming search quality. " SavvySearch attempted to do back in 1994 what Gary Price describes as the future in the forementioned SEWatch article,
For a long time I've said verticals will continue to grow in popularity and importance as meta search tools which are getting better all of the time will allow various database and content publishers to offer material (free or fee) to end users who will select these databases at the time of their search based on their information need.Like Gary's vision, SavvySearch searched not only web content, but also included specialized databases like Roget's Thesaurus, CNet's Shareware.com, and the IMDB. SavvySearch's raison D'etre was that no search engine, or even a group of engines was large enough to contemplate crawling the entire web. The major engines (Aliweb, Webcrawler, Lycos, Yahoo, etc...) lacked good coverage of the web. Furthermore, this was before major specialty sites started creating crawler-friendly database driven pages or provided DB feeds directly for indexing. SavvySearch's sibling, MetaCrawler, also addressed the coverage issue, but instead of focusing on intelligent service selection, it focused on addressing problems of freshness and relevance.
MetaCrawler was a project out of the University of Washington by graduate student Erik Selberg (advised by Oren Etzioni) that tackled not only recall but also staleness and (ir)relevance of search results of the search engines of the day. Unlike SavvySearch, it was a pure web search engine, although one of the future projects was to extend it to include databases. While it did attempt to solve recall problems, its primary aim was to "verify" pages by fetching to ensure their existence and freshness.
Metacrawler saved the user time and work by quering six search engines: Galaxy, Infoseek, Lycos, Open Text, Webcrawler, and Yahoo!. It then de-duped and fetched all of the pages returned. That's right, it fetched all of the pages returned by every engine! It “verified” results, eliminating dead or modified pages that were not irrelevant. According to their paper, on average 14.88% of search results were removed because they were "dead." In addition to removing dead pages, the pages were re-scored based on the query term, pages that changed since they were fetched by the search engine were removed.
In the process of re-scoring pages that it fetched, it also generated query sensitive page snippets. Back in this pre-cambrian era of search, there were no query sensitive summaries. It was too expensive to store the cached content. Instead, search engines provided a list of URLs a query independent description of the page. Users were left to hunt and poke in order to discover how the page was related to their query in more depth. Metcrawler improved the percieved relevance of search because users could more easily understand why a search result was returned. Selberg describes this list of "references", "Each reference contains a clickable hypertext link to the reference, followed by local page context (if available), a confidence score, verified keywords, and the actual URL of the reference. Each word in the search query is automatically boldfaced." This feature would not be duplicated again (to my knowledge) until 1999 when Google released their search engine.
The biggest drawback to MetaCrawler was that it didn't store cached pages, in contrast with SavvySearch's emphasis on economy of resources, metacrawler was very bandwidth and time intensive. Their fetcher was highly optimized so that it could simultaneously download over 4,000 pages at a time. Quite an accomplishment! However, fetching every result for every query doesn't scale, even with the benefits of caching. Instead, Selberg proposed that MetaCrawler could be a client side application and that ISPs could provide caching to speed page fetch time. However, this approach would have required substantial client-side bandwidth in the pre-broadband era. Even with MetaCrawler's highly optimized fetcher a query took on average over two minutes to verify all of the pages. (page 8, table 4).
A brief comparison
SavvySearch searched up to 20 engines at once, while MetaCrawler queried only six. SavvySearch included topic spefic directories and databases, while MetaCrawler only searched web search engines. MetaCrawler was slower, but more reliable than SavvySearch. Because MetaCrawler fetched all pages it could support more advanced query functionality, such as the minus query operator, restriction to a country, and a particular domain name extension. SavvySearch on the other hand did not support advanced query formats. Because it did no processing of pages itself, it was reduced to using the lowest common denominator. Neither provided a way to leverage the full advanced query power offered by most engines.
A primary reason for both of these engines birth was that in the era pre-Google, and even pre-AltaVista, having a single engine that provided even modest coverage of the web seemed impossible. Creators Selberg and Etzioni write,
Skeptical readers may argue that service providers could invest in more resources and provide more comprehensive indices to the web. However, recent studies indicated the rate of Web expansion and change makes a complete index virtually impossible.They cite an interesting source, two researchers at CMU who helped develop the Lycos search engine. Mauldin and Leavitt write in their paper Web Agent Related Research at the Center for Machine Translation: "First, information discovery on the web (including gopher-space and ftp-space) is now (and will remain) too large a task...the scale is too great for the use of a single explorer agent to be effective." Advances in storage technology and cheap bandwidth allow GoogleBot and other search engines to just that.
SavvySearch and MetaCrawler paved the way both for today's search engines and for the next generation of meta-search engines. MetaCrawler was purchased by Infospace and continues to operate as a meta-search engine, but bears little resemblance to its former self. It provided a platform for research on the next-generation of meta-search engines, HuskySearch, which reesarched AI applications to query refinements and Grouper which explored document clustering in metasearch. MetaCrawler's dynamic summaries are now the de facto standard, with Google being a primary pioneer in bringing it to the masses. The problems that these search engines attempted to address: the proliferation of search engines and the lack of stability and coverage in their results continues today. There are more engines than ever and there is still a significant amount of difference between the results in even the major engines. In future parts in the series we'll take a closer look at some of these problems and meta-search engines today try to address them.
References and Resources
 E Selberg and O Etzioni. Multi-Service Search and Comparison Using the MetaCrawler, 1995.
 D Dreilinger and A Howe. Experiences with Selecting Search Engines using Meta-Search, 1997.
 MetaCrawler, HuskySearch, and Grouper.
 Sonnenreich. A History of Search Engines.