Cuil's plans to differentiate itself
1) It's about the infrastructure, of course.
From a recent interview with GigaOm, Anna Patterson, formerly one of Google's infrastructure designers, reportedly said:
How it works is that company has an index of around 120 billion pages that is sorted on dedicated machines, each one tasked with conducting topic-specific search — for instance, health, sports or travel. This approach allows them to sift through the web faster (and probably cheaper) than Google...The Forbes article has a little more detail on their query serving architecture:
Finally, to compare with Google's architecture, a quote from Danny Sullivan's interview with Anna:
Patterson and Costello's impressive feat is that they've done this with a total of 1,400 eight-CPU computers (1,000 find and data-mine Web pages, the remaining 400 serve up those pages) [JD: Even assuming there is no redundancy 120 billion docs / 400 servers = 300 million documents per node. This seems unrealistically high, especially considering that Lucene, a widely used search library can realistically handle 10-20 million.] ...
Cuil attempts to see relationships between words and to group related pages in a single server. Patterson says this enables quicker, more efficient searching: "While most queries [at competitors] go out to thousands of servers looking for an answer, 90% of our queries go to just one machine."
If they [Google] wanted to triple size of their index, they'd have to triple the size of every server and cluster. It's not easy or fast...increasing the index size will be 'non-trivial' exerciseAccording to the news, Cuil's index serving infrastructure is a key competitive advantage over Google and the other major players. It remains to be seen if they can leverage this platform to produce world-class results.
On their size claims
Last I heard, Google's index is rumored to be in the 40 billion range, Microsoft is in the 10-20+ billion range. Cuil claims their architecture allows at least a 3x increase in index size over Google. However, it's hard to verify this because Cuil's hit counts are badly broken: a search for [the] returns an estimated 250 documents. The lack of support for advanced search, such as site: also makes it difficult to compare coverage of individual sites, such as Wikipedia.
Other differentiating features:
- Topic-specific ranking
From Danny's interview, it sounds like Cuil is doing post-retrieval analysis of document content, analyzing phrase co-occurrence and extracting 'concepts'. From the interview:
It figures out these relationships by seeing what type of words commonly appear across the entire set of pages it finds. Since "gryffindor" appears often on pages that also say "harry potter," it can tell these two words (well, three words -- but two different query terms) are related.
Cuil then reportedly computes a topic specific link score. It sounds very similar to Teoma's HITS technology. Again, there is no support (yet) for Cuil's claim that this is superior to other search approaches.
- UI and exploration
Cuil has a non-standard two or three column layout of results, which attempts to feel more like a newspaper, with images associated with many results.
It appears to use the information from the content analysis to create the 'Explore by Category' box to drill down into specific topics, as well as offering related searches as tabs across the top of the page.
Size matters, but it's more important to get the right content in the index. It's purely subjective since the web is infinite, but only a subset of the web is useful. Google tracks at least a trillion distinct URLs, and Cuil's crawls only a mere 186 billion (SE Land reference). It's critical that crawling and indexing be prioritized correctly. For example, despite the reported massive index size, this blog is not indexed by Cuil. While on its own this doesn't mean much, Daniel reports similar experiences with lack of coverage of his content.
I am unimpressed with Cuil's current coverage and relevance, but it's still early. Despite all the criticism (much of it justified), launching a search engine of this scale is an impressive feat. I think what Cuil's doing is exciting and I'm witholding judgment until it has time to mature. Once again, congratulations to the Cuil team and good luck with the long road ahead.